The one-processor system model that has dominated electronic system design since 1971 is now thoroughly obsolete. Today's SOC designers readily accept the idea of using multiple processors in their complex systems to achieve design goals and use the terms "control plane" and "data plane" to describe how these various on-chip processors are used on the chip. These terms appeared during the Internet and networking boom. At first, these terms referred to the design of multiple-board networking systems but have now become universal and are suitable for describing many systems such as audio- and video-encoding/decoding designs that must handle high-speed dataand execute complex control algorithms. Processor I/O data rates are as important as computational performance in such systems.
The main processor bus is the sole data highway into and out of most processor cores. Because processors interact with other types of bus masters--including other processors and DMA controllers--and to support SOC architectures employing bus hierarchies, main processor buses feature sophisticated transaction protocols and arbitration mechanisms that enable such design complexity. These protocols and arbitration mechanisms usually require multi-cycle bus transactions that can slow system performance.
For example, the main bus on Tensilica's Xtensa LX2 processor, called the PIF, uses read transactions that require at least six cycles and write transactions that require at least one cycle, depending on the speed of the target device. These transaction timings allow us to calculate the minimum number of cycles needed to perform a simple flow-through computation: load two numbers from memory, add them, and store the result back into memory. The assembly code to perform this computation might look like this:
L32I reg_A, Addr_A ; Load the first operand
L32I reg_B, Addr_B ; Load the second operand
ADD reg_C, reg_A, reg_B ; Add the two operands
S32I reg_C, Addr_C ; Store the result
To simplify this code, assume that memory pointers to values A, B, and C are already initialized in registers Addr_A, Addr_B, and Addr_C. If not, then more time will be needed for this computation.