J. Scott Gardner, senior analyst at The Linley Group/Microprocessor
Report, agreed with Bier. “Unlike video codecs, with well-defined algorithms that
designers can bake into hardware, the algorithms for embedded vision are
virtually unbounded and constantly evolving,” he said.
embedded vision a “perfect application” that can “take advantage of the
inherent data-level parallelism in algorithms.” However, it isn’t
enough to just have a lot of pixel-computation units, he added. “The
memory system and bus architecture have to be designed to efficiently
deliver pixel data at rates approaching a billion pixels per second.”
Asked about specific capabilities designers are looking for in optimizing processors for embedded vision applications, Bier rattled off such abilities as: being able to apply multiple kinds of architectural parallelism, taking advantage of the parallel nature of pixel processing; supporting shorter and longer data types (e.g., 8-, 16- and 32-bit) so that when less precision is needed, more operations can be performed in parallel and memory bandwidth can be conserved, and when more precision is needed, it’s available; offering very high memory bandwidth to get the massive amounts of required data into and out of the processor in an efficient manner; providing specialized instructions to efficiently implement key operations found in these algorithms.
Indeed, Tensilica’s IVP meets many such demands. IVP is based on a four-way Flexible Length Instruction eXtension (FLIX) architecture. The FLIX architecture is Tensilica’s version of VLIW that delivers high parallelism intermixed with code-compact instructions. It features a 32-way vector single instruction, multiple data (SIMD) dataset and a balanced 9-stage pipeline.
The architecture includes a direct memory access (DMA) transfer engine with up to 10 GBytes per second of throughput and local memory throughput of 1024 bits per cycle (sixty-four 16-bit pixels/cycle) to keep pace with resolution and frame rate requirements. The IVP also features many imaging-specific operations to accelerate 8-, 16- and 32-bit pixel data types and video operation patterns, according to Tensilica.
Tensilica vs. Ceva
To be clear, Tensilica is not the first company to develop a processor core concentrated on imaging and embedded vision. “CEVA deserves credit for blazing the trail in the nascent markets for embedded vision,” Gardner said. CEVA’s MM3101, announced in January of 2012, has many similarities to Tensilica’s IVP, which also uses VLIW in combination with SIMD.
That said, Gardner observed, “With the entry of Tensilica into the embedded vision market, CEVA will need to refresh its MM3000 platform.”
The MM3101 offers less raw computational performance and has less memory bandwidth than Tensilica’s IVP. Tensilica supports 32-way SIMD (512-bit vectors) and can process thirty-two 16-bit pixels in parallel, compared to sixteen 16-bit pixels per cycle supported when using both of the 128-bit vector-processing units in the MM3101, Gardner explained. While CEVA’s MM3101 has a single 256-bit vector load/store unit, the Tensilica IVP supports up to two 512-bit memory references per cycle, allowing up to four-times the memory bandwidth.