Marking its 50th anniversary, the 2003 International Solid State Circuits Conference (ISSCC) in San Francisco has chosen a fitting theme: "Power-aware systems". As circuit dimensions shrink and designers crunch out circuits for ever higher performance, it is getting more difficult to use the intended integrated circuits in small mobile devices where power consumption is the critical criterion.
While ISSCC technical presentations are always about performance, this year the performance quest is tempered by power challenges as more circuit blocks get integrated into systems-on-chip. As such, a good portion of the conference's technical focus is on the design of SoCs the ultimate integration level in solid-state performance/power tradeoffs.
Fittingly, the plenary presentation by Gordon Moore, a founding father of the IC industry, offered a perspective on the past, present and future of integrated circuits. By any measure, the semiconductor industry has experienced fantastic growth over the last 50 years, passing $200 billion in annual revenue last year, and has become the foundation of a trillion dollar electronics industry. Moore attributes this unprecedented growth as the result of the combination of a unique technology and an extremely elastic market.
Over this brief history, many parameters relating to the industry have changed exponentially with time: chip complexity, chip performance, feature size, and the numbers of transistors produced each year. Moore explored some of these trends, and looked into the current status of silicon technology, and at the challenges going forward especially as we enter the power-conscious mobile era.
In fact, in his keynote address, professor Takayasu Sakurai of University of Tokyo, asserted that power increase will remain as one of the main obstacles to Moore's Law. "Unless VLSI power is lowered by orders of magnitude, we cannot enjoy the progress that scaling offers," he noted. In keeping with the power theme of the conference, Sakurai covered the arsenal designers currently possess and the ones that they should provide in their low-power armory, to cope with ever-increasing leakage loss, as well as dynamic power.
Sakurai prepped his talk to present novel techniques that range over the system, software, circuit, and device level including interconnect and I/O issues. In other words: SoC engineering will not succeed unless the basic hurdles of leakage loss and dynamic power are harnessed. "The biggest challenge that SoC designers must resolve in the future is the fact that transistors for digital and memory circuits will be more and more leaky as technology generations advance," he said.
Sakurai urged the audience to examine cooperative approaches between software development and circuit design, and circuit design and technology process.
The first factor that needs to be confronted is that the power source for power-aware electronics needs to change drastically. While Li ion batteries are widely used for high-power portable systems, Sakurai advocates the use of a direct methanol cell, based on a chemical reaction of methanol and air and that is capable to extend the stored energy to more than 1 kWh from the Li ion's 100 Wh.
On a system level, Sakurai places his confidence on a concept that he terms the "power auction" where proper assignment of power budgets among home appliances can be applied to an SoC. In the home appliances arena, connected appliances request a central manager, its required power, and a "power auction" takes place at the center. A total of 20 percent power reduction is possible this way, and the concept can be translated to SoC designs, a la the ChipOS power-management platform. This was first reported by Hitachi's researchers at the 2001 ISSCC. In that paper, the researchers described a kernel called ChipOS which is implemented in the chip and manages the chip power resources among the many blocks of the SoC.
Sporting a similar hierarchical model to a PC operating system, the ChipOS optimizes and schedules each block power state, so that the total chip power consumption is less than the desired maximum value (Pmax). In a four-block conceptual example, the reseachers showed that using ChipOS consumed less than 400 mW (with Pmax being 400mW). In contrast, a conventional chip would consume as much as 850 mW because there exists a possibility that all blocks are working simultaneously.
Future power needs
A number of ISSCC technical sessions and special-topic evening sessions address the power question of SoCs. For instance, some sessions looked into the next-generation design challenges such as how far integration can go in 3G cell phones and new circuits based on emerging technologies. In addition, a session offered some of the best circuit-design-related papers from the Design Automation Conference (DAC) held last June.
And, power issues were also discussed in a short course on system-on-chip design; in a workshop on microprocessor design in the power-constrained era; and in a workshop on Gigahertz Radio Front Ends, dubbed GIRAFEs.
Bruno Murari, an executive at STMicroelectronics, said in his plenary presentation that as the industry moves to SoC and in the future to SiP (System in Package) techniques, micro-machined technologies will be included . For that, "particular attention needs to be given to the problem of power-consumption reduction in portable devices," he warned.
In their technical presentation, Intel researchers (Hillsboro, Oregon,) report they have developed sleep transistors and body bias used to control active leakage for a 32-bit integer execution core implemented in a 100-nm dual CMOS technology. In the scheme, a PMOS sleep transistor degrades performance by 4 percent but offers 20 times the leakage reduction, which is further improved with body bias. Time constants for leakage convergence range from 30ns to 300ns allowing a 9 to 44 percent power savings for idle periods greater than 100 clock cycles, according to the Intel research team.
Elsewhere, collaboration between researchers at Hitachi and the University of Calif. at Berkeley yielded a shared n-well, dual-supply-voltage 64-bit ALU module fabricated in a 0.18µm, 1.8V five-level metal CMOS technology that operates at 1.16 GHz and occupies a 9 mm2 die. For a target delay increase of 2.8 percent, the researchers were able to obtain energy savings of 25.3 percent using dual supplies. And, an 8.3 percent delay increase saved 33.3 percent in energy.
Meanwhile researchers at Hitachi, Central Research Laboratory (Tokyo. Japan), have come up with a technique for adaptive-universal control of clock frequency, supply voltage, and body bias which optimizes the performance-to-power ratio of chip multiprocessors. The technique is based on a compound built-in self-test and self-instructed look-up table scheme for an autonomous and decentralized system. The researchers report that when applied to a 32-bit ALU, power is reduced by seventy times.
A presentation by Mitsubishi Electric (Hyogo, Japan), detailed a low-power microcontroller designed in 0.10µm body-tied SOI CMOS technology by reusing existing design resources developed in 0.18µm bulk CMOS. Only two new masks are needed for this work. The performance has been evaluated by simulations and indicates operation at 400 MHz with 183 mW dissipation at 0.8V, and represents a five-times improvement in the power-delay product.
In one of the more interesting developments, researchers from Starc (Semiconductor Technology Academic Research Center, Yokohama, Japan) discussed recent work on a 32-bit adder in a 0.13µm CMOS process that consumes 9 µW at 50MHz and 0.3V and operates at 500MHz at 0.6V. In this scheme, the power of the SoC can be reduced to 1/4 that of standard CMOS by gating the forward body bias in the IP blocks.
The Starc team's self-adjusted forward body bias technique reduces the worst-case delay due to process, voltage, and temperature variations in several ways. The forward bias technique relaxes the short channel effect while the minimized threshold voltage reduces the delay dependence on voltage at low supply voltages, and the temperature dependence of the forward bias relaxes the temperature dependence of the circuit delay at low supply voltages. The 32-bit adder was designed to verify the affect of the technique.
The team discussed the implementation of this technique on an SoC that consisted of an MPEG2 circuit, a CPU, a motion estimation circuit, a digital filter circuit, a discrete cosine transformation circuit and a variable length decoder. The leaked power was reduced to about 20 percent of the total power.
Prepping for MEMs
And, in an example of how MEMS technology is beginning to impact SoC designs, researchers from STMicroelectronics, (Crolles, France) and CEA-Leti (Grenoble, France), combined forces to report on a MEMS switch that is driven by a 0.25µm BiCMOS IC and achieves 0.4dB insertion loss and 54dB isolation at 2GHz. The 400 x 50µm MEMS device is built on top of the wafers, thus enabling an SoC design.
Researchers at Pixim, (Mountain View, Calif), offered a report on a CMOS imaging SoC that included an embedded frame buffer and that operates at 100MHz. The programmable chip produces color video at up to 500 frames/sec with over 100dB dynamic range using multi-capture techniques. The sensor dissipates a mere 600mW which includes the I/O circuit.
Meanwhile, a research team at KAIST (Korea Advanced Institute of Science Technology, Daejeon, Republic of Korea) have come up with a 10.8 x 6.0 mm2 prototype chip implemented with a star-connected on-chip network. The chip consists of a PLL, 1kbyte SRAM, two 2x2 crossbar switches, up/down samplers, and synchronizers. The on-chip network contains 81k transistors, dissipates 264mW at 2.3V and 800MHz, and provides 1.6 Gbytes/sec per port and 12.8 Gbytes/sec ecaggregated bandwidth, supporting plesiochronous communication without global synchronization.
The team claimed this is the first successful implementation of an 800-MHz star-connected on-chip network supporting plesiochronous communication among internal IP blocks. Previous reported work in this area dealt only with architectural aspects, without any chip implementation.
Matsushita Electric (Moriguchi, Japan) presented a mixed-signal SoC for DVD applications designed in 0.13µm six-level metal CMOS. Up to now, conventional DVD systems still need several chips. One DSP, two 32-bit RISC CPUs, three dedicated processing units, a PRML read channel with an analog front end (AFE) and several other subsystems are integrated on the same die. The AFE includes a 5th order Gm-C filter. The SoC contains 24M transistors in a 64mm2 die and consumes a low 1.5W at 40Msamples/sec operation, which corresponds to the mode for 1.5x DVD playback system.
And, a single-chip MPEG2 audio/video encoder and decoder designed for consumer digital recording systems, was reported by Philips Semiconductor researchers from Eindhoven, The Netherlands and Caen, France. The chip includes a CPU core with peripherals, a PCI/XIO bus interface, data streaming units to data sources and sinks, graphics engines, video display units and video DACs. It contains 32M transistors in a 102mm2 area and is fabricated in a 0.18µm six-level metal CMOS process.
The SoC integrates all of the recording and playback functions and is based on Philips' Nexperia architetcture for easy transfer of the IP blocks to derivative SoCs. The team designed the chip to be static, with clocks of non-active blocks able to be reduced or shut down to reduce the power consumption. The architecture consists of a CPU, several functional units and a memory subsystem made of a Central Data Unit (CDU) and a SDRAM memory interface. Internally the CDU arbitrates between the access of the functional units and the access of the data transferred to and from the SDRAM.
Aiming at portable 2D/3D graphics and MPEG4 applications, researchers at KAIST (Daejeon, Korea) and Hynix Semiconductor, (Icheon, Korea) have developed a 121mm2 graphics LSI. While not considered a full SoC, the chip contains a RISC processor with MAC, a 3D rendering engine, 29Mbit DRAM and is built in a 0.16µm pure DRAM technology. Programmable clocking allows the chip to operate in several power modes for various applications. In lower cost mode, power consumption is under 210mW, delivering 264M texture mapped pixels per second.
Engineers at Sanyo Electric (Gifu, Japan), held a presentation on a one-chip image processor for next-generation digital cameras and broadband PDA multimedia mobile phones. It is capable of processing JPEG2000 data with 30frames/sec and a 27MHz operating frequency. The process is fabricated in 0.25µm CMOS and contains 8.5M transistors in a 103mm2 area. And, a researcher team from NEC, (Kawasaki, Japan), reported on a 51.2GOPS fully programmable and scalable video recognition processor that is based on a linear connection of 128 4-way VLIW processing elements and an asynchronous data mapping mechanism.
The chip is able to execute detection in under 33msec/frame for complex weather, robust road area/lane marking, and vehicle movement. It contains 21.4M transistors in a 121mm2area and is fabricated in 0.18µm seven-level metal MOS process.
Packing in more functions was the goal behind the design of a new chip from STMicroelectronics, (Agrate Brianza, Italy). It consists of a 1GOPS dynamically reconfigurable processing unit with embedded flash memory and SRAM-based FPGA for image/voice processing/recognition applications. Code, data and FPGA bitstreams are stored in the embedded flash memory and can be independently accessible through 3 content-specific, 64-bit I/O ports that exhibit a peak read rate of 1.2Gbytes/sec. The SoC is implemented in a 0.18µm six-level metal CMOS flash technology and occupies an area of a mere 70mm2.
The system performance of the chip is being evaluated for an image processing application for facial recognition as well as a speech processing application. More than 20 specific instructions were designed as C/assembly-callable functions, automatically translated to RTL, then synthesized and mapped to the embedded FPGA. In comparison to a 32-bit RISC with basic DSP extensions, the same processor enhanced with application-specific instructions measured speed-ups ranging from 1.8 to 10.6 times on the most demanding tasks, with an overall improvement of 8.5 times better. Energy efficiency lies somewhere between that of a conventional ASIP/DSP and a dedicated configurable hardware implementation, in the range of several MOPS/mW at 1.8V operation.
For the wireless networking arena, several presenters discussed dual-mode chips that adhere to both 802.11b and Bluetooth specifications with decent power consumption figures. For example, Broadcom Corp.'s (El Segundo, Calif.) designers developed a dual-mode CMOS 2.4GHz transceiver for 802.11b/Bluetooth applications. The chip consumes 65mA in RX and 78mA in TX from a 3V supply.
The receiver achieves a typical sensitivity of -88dBm at 11Mbit/sec for the 802.11b mode, and -83dBm for the Bluetooth mode. The receiver minimum IIP3 is -8dBm, and the transmitter delivers a nominal output power of 0dBm, with a power control range of 20dB in 2dB steps. The dual-mode transceiver is fabricated in a 0.35 CMOS process and integrates all the receive and transmit building blocks, such as LNA,VCO,and frequency synthesizer .The receiver active current is about 65mA and the transmitter consumes 78mA.
For their part, engineers at Wireless Interface Technologies (San Diego, Calif.), have developed a 2.4GHz dual-mode RF transceiver IC that implements transmit and receive functions for both Bluetooth with -80dB sensitivity and 802.11b Wireless LAN with -88dB sensitivity in a single chip. This was done without doubling the required silicon area. Implemented in 0.18µm CMOS process, the circuit operates at 1.8V, and die size is only an amazingly small 16mm2, including pads.
A research team from KAIST (Daejeon, Korea) offered specifics on a 2.4GHz radio for IEEE 802.15.4 WPANs using 0.18µm CMOS technology. The SoC consumes 21mW and 30mW at 1.8V supply in RX and TX mode, respectively. The receiver utilizes a low-IF architecture with a poly-phasefilter and transistor linearization technique. A ROM-based DSSS GMSK signal is directly up-converted using I/Q mixing. The silicon area is 8.75mm2.
And, engineers at Integrant Technologies (Kyeongki-do, Korea), discussed their direct-conversion satellite tuner-demodulator SoC, realized using 0.18 µm CMOS technology. The IC down-converts a 950-2150 MHz satellite broadcasting signal to base band and demodulates the signal to an MPEG data stream. The IC conforms to both DVB-S and DSS standards. Experimental results show a 9dBm IIP3 while consuming 230mA from 1.8V supply.