Just 10 years ago, the challenge chip designers faced was to design logic blocks with as few gates as possible to fit all the functions in a target die size. Today, advancements in semiconductor process technologies let designers easily pack millions of gates and complex mixed-signal components into a single chip. Nonetheless, chip designers still face the challenge of reducing the number of gates and imple-
menting efficient architectures--not only to reach a target size but also to reduce total system power consumption.
From cell phones to portable media players, more products in the market are battery-powered, and low power consumption obviously leads to longer battery life. In the consumer electronics market, even ac-powered devices can benefit from reduced power consumption. Lower power draw leads to lower system cost by enabling the use of less-expensive chip packaging and the reduction or elimination of heat-dissipation components (e.g., fans and heat sinks).
But the demand for higher performance and more features has only served to increase total system power consumption.
The challenge is to achieve higher integration and performance within the system power budget. To reach that goal, chip designers are moving away from single-processor architectures and toward architectures that distribute tasks among multiple processors or cores. The cores can be either symmetric or heterogeneous, depending on the application.
For systems that require the lowest power consumption and best cost/performance, chip designers prefer to use heterogeneous multicore architectures, with task-dedicated processors that operate concurrently.
Care and forethought are required when designing multiprocessor chips; otherwise, a design can easily get trapped into data bandwidth limitations that will limit system performance. Designers must architect chips to handle data transactions efficiently and to minimize the stall cycles as multiple masters access the external memory.
Most embedded systems today still use a generic single-processor architecture in which one central processing unit is charged with handling the multiple tasks required in the application. For example, the CPU may be required to decode multimedia audio/video content, generate graphics for the user interface and manage peripheral devices. All of those tasks need to be done simultaneously on top of the general computing functions, such as running the operating system.
To meet the ever-increasing need for more processing muscle, system-on-chip architects and designers have traditionally relied on Moore's law for faster transistors to increase system frequency and consequently obtain higher performance. The faster transistors allowed designers to come up with complex branch prediction circuitry and add pipeline stages as needed to keep pace with the performance requirements. But this approach has created a trend of designing bigger and more-power-hungry pro- cessors with every new generation of process technology.
In the past, when people were designing with 0.18-micron technologies, it was acceptable to increase system performance by adding functions and running the processor at a higher frequency, since those techniques yielded a sufficient return in exchange for the slight increase in power consumption.