J. Scott Gardner, senior analyst at The Linley Group/Microprocessor
Report, agreed with Bier. “Unlike video codecs, with well-defined algorithms that
designers can bake into hardware, the algorithms for embedded vision are
virtually unbounded and constantly evolving,” he said.
embedded vision a “perfect application” that can “take advantage of the
inherent data-level parallelism in algorithms.” However, it isn’t
enough to just have a lot of pixel-computation units, he added. “The
memory system and bus architecture have to be designed to efficiently
deliver pixel data at rates approaching a billion pixels per second.”
Asked about specific capabilities designers are looking for in optimizing processors for embedded vision applications, Bier rattled off such abilities as: being able to apply multiple kinds of architectural parallelism, taking advantage of the parallel nature of pixel processing; supporting shorter and longer data types (e.g., 8-, 16- and 32-bit) so that when less precision is needed, more operations can be performed in parallel and memory bandwidth can be conserved, and when more precision is needed, it’s available; offering very high memory bandwidth to get the massive amounts of required data into and out of the processor in an efficient manner; providing specialized instructions to efficiently implement key operations found in these algorithms.
Indeed, Tensilica’s IVP meets many such demands. IVP is based on a four-way Flexible Length Instruction eXtension (FLIX) architecture. The FLIX architecture is Tensilica’s version of VLIW that delivers high parallelism intermixed with code-compact instructions. It features a 32-way vector single instruction, multiple data (SIMD) dataset and a balanced 9-stage pipeline.
The architecture includes a direct memory access (DMA) transfer engine with up to 10 GBytes per second of throughput and local memory throughput of 1024 bits per cycle (sixty-four 16-bit pixels/cycle) to keep pace with resolution and frame rate requirements. The IVP also features many imaging-specific operations to accelerate 8-, 16- and 32-bit pixel data types and video operation patterns, according to Tensilica.
Tensilica vs. Ceva
To be clear, Tensilica is not the first company to develop a processor core concentrated on imaging and embedded vision. “CEVA deserves credit for blazing the trail in the nascent markets for embedded vision,” Gardner said. CEVA’s MM3101, announced in January of 2012, has many similarities to Tensilica’s IVP, which also uses VLIW in combination with SIMD.
That said, Gardner observed, “With the entry of Tensilica into the embedded vision market, CEVA will need to refresh its MM3000 platform.”
The MM3101 offers less raw computational performance and has less memory bandwidth than Tensilica’s IVP. Tensilica supports 32-way SIMD (512-bit vectors) and can process thirty-two 16-bit pixels in parallel, compared to sixteen 16-bit pixels per cycle supported when using both of the 128-bit vector-processing units in the MM3101, Gardner explained. While CEVA’s MM3101 has a single 256-bit vector load/store unit, the Tensilica IVP supports up to two 512-bit memory references per cycle, allowing up to four-times the memory bandwidth.
Actually there is a 3rd player in the mix in the form of CogniVue with their Image Cognition Processing (ICP) technology. CogniVue has been designing vision processing IP for low power embedded and mobile applications for several years and launched the ICP architecture in 2010, making the IP available in their own SOC. In 2012 CogniVue licensed their ICP APEX processing core to Freescale (http://media.freescale.com/phoenix.zhtml?c=196520&p=irol-newsArticle&ID=1734693&highlight=) for vision processing in Automotive safety applications. CogniVue is now a technology licensing company squarely focused on embedded vision processing, and in fact was one of the founding members of the Embedded Vision Alliance in 2011. The comments in the article about the complexities of vision processing are quite accurate and the ICP APEX processing core is specifically architected to "hide" memory access latency while optimizing the vision processing pipeline. Check us out at www.cognivue.com.
Article completely left out CogniVue Corporation which was a founding member of EVA before CEVA and Tensilica announced their vision cores. CogniVue's APEX image cognition processor (ICP) technology addresses what is mentioned in this article as key - an efficient processor architecture for vision processing is not just about massively parallel pixel processing, but about creating vision friendly data structures, minimizing data movement and somehow achieving na efficiently pipelined implementation of very complex vision algorithms.
This is a processor core tailored to the imaging pipeline.
Just as Tensilica used its software profiling tools to create an optimized CPU for its HiFi family of audio DSPs, Tensilica engineers, this time around, "created a specialized DSP with an instruction set that reduces the cycle count of the key embedded vision algorithms," according to Linley Group's Gardner.
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.