Design Article
Designing low-power multiprocessor chips
Shinya Fujimoto
2/12/2007 9:00 AM EST
As designs migrated to 0.13-micron process technology and beyond, however, that approach did not yield as much gain in performance, in light of the amount of power the chips now dissipated.
In addition, the market's preference in many cases has shifted from raw performance improvements to greater power efficiency. That change has caused designers to move away from using more gates and running the system at higher frequencies; instead, they're looking for alternatives that achieve optimal cost and performance under tight power budgets.
Parallel computing
To meet the goals of higher performance and reduced power consumption, designers have begun to integrate multiple identical cores into chips. The most notable example of this trend, advanced by Intel, is to integrate multiple CPU cores on a single die, as opposed to trying to increase the frequency of a single CPU by adding more pipeline stages.
A similar approach has been taken in custom chips used in the latest generation of videogame consoles, such as the Sony Playstation 3 and Microsoft Xbox 360. Both systems use architectures that have a single main processor that integrates multiple cores of the same type into a single chip.
This symmetrical multicore approach is effective for systems that require flexible computing capabilities, such as PCs. It is also an appropriate approach for the new videogame consoles, which are attempting to become "the PC in the living room."
This type of architecture, however, is not suitable for products and applications that need to perform dedicated tasks with the lowest power consumption and optimal cost/performance.
Heterogeneous multiprocessor systems integrate a number of unique processors or cores into a system. Each processor supports a specific task and does it more efficiently with fewer gates than a generic CPU core. One implementation of such a system might include a fast video processor tasked to perform decoding of compressed video, a graphics processor dedicated to handle rendering of images for the user interface, a sound processor to process audio and add sound effects, and a CPU to handle low-level control such as running a real-time operating system.
All of these tasks could be performed with multiple CPUs in a symmetrical multiprocessor system or even by using a single CPU running at a high frequency. But for a system that is required to operate with very low power consumption and at low cost, developing a chip with the smallest die size that can operate at the slowest possible frequency becomes paramount.
An example implementation of a heterogeneous multiprocessing architecture is the Zevio 1020 multimedia application processor from LSI Logic Corp. The Zevio 1020 integrates a single ARM9 processor and a DSP that runs at 150 MHz, a 3-D graphics processor, a 3-D sound processor and a 2-D display processor for direct output to LCD panels and televisions.
Granted, the raw performance of this chip does not match that of the latest videogame consoles, but it enables performance similar to that of second-generation videogame consoles for products that retail for under $100 while keeping the power consumption to less than 300 milliwatts.
That price point and power efficiency would have been difficult to achieve if LSI had chosen to implement the design using generic CPUs for each of the targeted tasks.
There are inherent challenges in designing a well-balanced multiprocessor system, however. One challenge is to reduce the bottleneck when accessing the external memory. To achieve the highest performance for a given power budget, it is essential that a memory controller provide very efficient arbitration as well as high data throughput. At the same time, to bring down system costs, the memory controller must be optimized to work with 16-bit-wide memories.

