Part 1 of this feature discussed processor speed, efficiency, and latency issues and the advantages of multithreading in developing Mobileye Vision Technologies next generation, vision-based driver-assistance system, the EyeQ2TM.
Multi-threading drives bus utilization efficiency
Multi-threading using the MIPS32® 34Kf core specifically tackles the processor utilization problems of efficiency and latency (see figure below).
When a hardware thread stalls, other threads can take advantage of the stalled cycles. Unlike software multiplexing between tasks, the single cycle context switch (see sidebar, next page) means there is zero processor overhead involved in performing the switch. The 34K core multi-threaded implementation also allocates processor cycles to threads and sets the relative thread priorities with an optional Quality of Service (QoS) manager block.
EyeQ2 SoC design is a multithreaded architecture.
View a full-size image
This QoS capability enables two essential prioritization mechanisms that will influence the way the processor operates across the bus. The first is the ability to prioritize one thread over another, which allows a thread to be set up to run as soon as the data is available from the bus. The second is to ensure a specific thread gets a specified percentage of the processor cycles over time, thus ensuring that overall system bus bandwidth is effectively allocated to service the needs of that thread and thus the related computational block it is serving.
This use of the QoS mechanism reduces the incidence of processor stalls still further, since the hardware threads can be "tuned" to minimize latencies across the bus and maximizes overall bus and processor utilization. In particular, critical real-time data can be fed across the bus directly into the thread processing a specific task, and used immediately, rather than having to wait for other processes to finish.
This more efficient use of the system bus leads to a dramatic increase in processor efficiency. This approach also means the instructions to the computational blocks are more deterministic, with predictable, more controllable latencies.
Using a core with four hardware threads and the QoS manager to tune and prioritize critical threads, processor utilization rose further, from 60% to close to 90%, or 0.9 instructions per cycle in the Mobileye EyeQ2 second generation, vision-based driver-assistance system (see Part 1 of this feature).
It would be difficult, if not impossible, to achieve this level of performance gain, without the ability to accurately measure real-time performance within the system. Fortunately, Mobileye engineers had very accurate and correlated real-time information on what was happening in both the processor and on the bus.