When choosing an architecture for the BRCM 5000 CPU, the design team examined several options, says Dr. Ramesh Senthinathan, a senior director of engineering in the Broadband Communication Group. "Increasing CPU performance through complex out-of-order techniques produces an exponential rise in die area and power with relatively little increase in performance," he indicated. "Multithreading turns out to be a more efficient way to achieve higher performance."
Multithreading helps fill the empty cycles caused when the CPU must access the second-level cache for data. In this case, the CPU simply executes instructions from the second thread until the first thread receives its data. Multithreading also allows the BRCM 5000 to emulate the dual-CPU structure of the predecessor BRCM 4380. Because the two threads appear to software as separate CPUs, a single BRCM 5000 core can run two operating systems.
Depending on the number of cache misses it encounters, a single-issue CPU can fill 60 to 75 percent of its execution slots on many software applications. This situation leaves relatively few slots for the second thread, limiting the performance gain of multithreading. A CPU that can issue two instructions at a time, however, will typically fill about 50 percent of its execution slots, leaving plenty of room for the second thread. According to Senthinathan, this dual-issue, dual-thread design is a "sweet spot" for multithreading, which is why Broadcom chose this approach for the BRCM 5000.
To achieve the 1.3GHz cycle time, Broadcom used a combination of custom logic and synthesized logic. For example, the clock tree is hand-designed to minimize clock skew. Critical speed paths use custom domino circuitry. Floor planning is also important, so the major circuit blocks are placed early in the process to minimize wire delays. Broadcom's Central Engineering team provided custom circuits such as high-speed SRAM and register files to achieve the high frequency.
Chips using the BRCM 5000 include a technology that Broadcom calls Adaptive Voltage Scaling (AVS). The chip contains certain test circuits that determine if it is operating near the fast-fast corner or the slow-slow corner. These test circuits contain both analog and digital functions to get a precise reading of the transistor characteristics.
For a chip with fast, leaky transistors, the supply voltage is internally lowered, reducing both leakage and transistor speed, but the fast transistors can still achieve the rated clock speed even at the lower voltage. Conversely, the voltage is increased for chips with slow transistors, boosting their performance. Thus, AVS reduces the rated worst-case power, which only occurs in fast-fast chips, while improving speed yield.