Design Article
Multi-threaded design tackles SoC performance bottlenecks: Part 1
Elchanan Rushinek, Mobileye Vision Technologies, and Pete Del Vecchio, MIPS Technologies
12/22/2006 1:20 AM EST
In preparing the next generation automotive electronics driver-assistance systems, researchers at Mobileye Vision Technologies spent a considerable amount of time analyzing the system performance of their existing EyeQ1TM to identify bottlenecks. The EyeQ system takes data from a camera and looks for elements within the video image to generate lane-departure warnings, forward-collision warnings, and vision/radar fusion. The system has been adopted by several top-tier suppliers and car manufacturers in Europe, the U.S., and Japan for 2007-model-year vehicles.
The next iteration driver assistance/video analysis/navigation system from Mobileye incorporates the same capabilities as the EyeQ1 for active safety along with greater speed and real-time performance, and more functionality, such as pedestrian detection. The EyeQ2 will debut in late 2008 models. But the jump in capability required a significant performance boost, proven reliability, and guaranteed real-time response.
The company selected the high-performance MIPS32® 34Kf core from MIPS Technologies for its new system-on-chip (SoC) design. The use of this multi-threaded processor in the EyeQ2 design was key to realizing a massive performance increase over the previous single-threaded processor design.
When they started looking toward the next design iteration, Mobileye engineers found the RISC processor employed in the EyeQ1 design (see figure below) was achieving an instruction utilization of just 0.3 instructions per cycle, compared with a potential maximum of 1 instruction per cycle. The cause of this limited performance, common to many such bandwidth-hungry applications, was the inefficiency of the interface between the controlling CPU and the eight Vision Computer Engine (VCE) blocks in the system providing real-time data across the internal SoC bus.

In a traditional single issue RISC CPU core, the pipeline stalls when there is a cache miss, and has to wait until the new instructions are retrieved from memory over the bus. In a real-time system, there is generally a large amount of data flowing across that bus.
This situation is bad enough for the central CPU, but when the processor is also feeding data and instructions to dedicated VCE engines, it creates a fundamental bottleneckwhich either slows down the whole system, or means blocks of data are not processed, potentially causing errors.
But seemingly straightforward solutions do not always work. For instance, boosting the CPU clock speed just increases the frequency of processor stalls; and adding additional processors impairs real-time system bandwidth through additional internal bus contention and design complexity.
These issues provide major architectural challenges for the core CPU designers and require careful attention to CPU interaction with the SoC bus, otherwise system latency may be unpredictablea significant problem for real-time system developers. A predictable, controllable latency can be compensated for as part of the system designan unpredictable latency cannot, regardless of the performance of the core CPU.
After careful evaluation, Mobileye moved to the multi-threading MIPS32 34Kf processor core to tackle both the CPU efficiency and the bus contention bottlenecks in its new SoC design.



