BERKELEY, Calif. An idea that could revolutionize system-on-chip design is quietly taking root in Berkeley, Calif., as researchers develop a "chip-in-a-day" methodology for DSP-like functions. Leveraging precharacterized, high-level macros and a highly parallel architecture, the approach aims to convert a data flow diagram into a GDSII layout file in 24 hours.
Far from posing trade-offs, the methodology devised by the Berkeley Wireless Research Center (BWRC) claims to be two to three orders of magnitude more efficient in power and area than architectures based on software processors. Algorithms are directly mapped into hardware that derives its parallelism not from multiple CPUs, but from hundreds or thousands of distributed arithmetic units.
Given that the typical ASIC design cycle is nine to 12 months, doing a chip in a day would be a radical departure. "The ability of systems houses to implement new functionality has been severely limited, and they're backing off with vastly inferior architectures," said Bob Brodersen, professor of electrical engineering and computer science at the University of California, Berkeley, and BWRC's scientific director. The center's methodology, he said, could result in "much faster transitioning of really high-performance algorithms into the real world."
Brodersen acknowledged that his group hasn't yet designed a real system-on-chip (SoC) in a day, although the researchers did design a small, 12-bit finite-infinite-response filter test chip within a few hours. But he thinks that a 24-hour chip-design cycle is very possible, given the development of high-level estimation tools and the promise of a unified EDA database. BWRC has the advantage of looking at a narrow, but significant, application niche: data flow-intensive chips with small amounts of control.
The methodology uses Simulink from The Mathworks to draw a high-level data flow diagram. Control flow is described with Simulink's Stateflow diagrams. Data path macros are implemented directly using a tool such as Synopsys Inc.'s Module Compiler, while the control logic is translated to VHDL and synthesized. Commercial place and route tools then finish the job.
Based on precharacterized hardware components, the BWRC approach presents an alternative to programmable and reconfigurable SoC architectures. But it's nonetheless a type of platform-based design, said Richard Newton, dean of engineering at UC Berkeley and director of the Gigascale Silicon Research Center. "This work is an essential complement to the more highly programmable, platform-based approaches we have been pursuing" in the gigascale project, Newton said. "In many cases, if we can implement a chip efficiently using the chip-in-a-day platform-based approach, it will be a better match."
Bhusan Gupta, research manager at STMicroelectronics' Berkeley R&D lab, called the BWRC work "very promising." STMicro is supporting the research with funding and will fabricate the silicon coming from the chip-in-a-day flow, he said.
"This kind of flow may not be able to build a huge chip right now, but maybe it will prod EDA companies to think about what they're doing," Gupta said. If the methodology produces good silicon, he said, STMicro will adopt it in its R&D work and will work with EDA vendors to make the flow viable.
Intel Corp. is also intrigued by the concept, said Leslie Rusch, senior staff researcher at Intel and BWRC liaison for the company. "We're not likely to use it at any time that's currently envisioned, but we think it has a lot of promise," she said.
One EDA vendor that's keenly interested is Cadence Design Systems Inc., which has assigned a researcher, Shauki Alisad, to help with the BWRC work. "It's great. It's a very radical sort of thinking," said Ted Vucurevich, corporate vice president of research at Cadence. Vucurevich said he believes the concept is practical, assuming most of the synthesis work has already been done and that the actual chip design consists of integrating predesigned, high-level functions.
There are skeptics too, among them Gary Smith, chief EDA analyst at Gartner Dataquest. "Anyone who thinks you can develop a production product in a day doesn't have a good understanding of the design process," he said. "Any of these C/C++ or algorithm-to-silicon approaches will be a rough-cut implementation."
The BWRC project has similarities to ALU-array technology announced by Elixent Ltd. (Bristol, England), a company spun off from Hewlett-Packard Laboratories. Both groups are seeking to map DSP and data flow algorithms to distributed ALU architectures to provide higher performance than is possible with software running on a uniprocessor but without the design effort required for hardwired logic or the area inefficiency of FPGA logic. And both groups are coming up with similar numbers in terms of the performance density.
Elixent expects to license its reconfigurable technology to system-on-chip builders for use in fully diffused ASIC-style designs.
BWRC's Brodersen believes that a full production implementation of a complex chip is feasible in one day but he's not talking about conventional ASICs or processor-based system-chips. The chip-in-a-day methodology, in fact, is an outgrowth of BWRC research that compared the efficiency of direct hardware mapping to software processors.
By directly mapping algorithms into parallel arithmetic units, BWRC claims to achieve computational efficiencies of 100 to 1,000 million operations/second per milliwatt, with densities of 100 to 1,000 Mops/mm2. This, the team claims, is two to three orders of magnitude more efficient than software processors. Brodersen said the researchers were "shocked" when they quantified the numbers.
The type of parallelism advocated by BWRC is not a von Neumann architecture with multiple CPUs. "There is no CPU in this thing," Brodersen said. "There is nothing 'central' to it. It's fully distributed, with lots of different computational units adders, registers, multipliers just what's needed to do the algorithm. So it's a direct mapping between the algorithm and the architecture."
Having found an efficient architecture for data flow-intensive algorithms, the next question was how to efficiently design such a chip. That meant leaving behind a design flow that starts with a sequential language like C, and then attempting to rediscover parallelism during synthesis. The group chose Simulink because it captures algorithmic parallelism from the very beginning, and because it's tightly integrated with the popular Matlab tool.
BWRC advocates an automated design flow based on a single high-level description, overseen by a single design team. With today's ASIC flow, Brodersen noted, one team typically does the system-level design and hands it off to another team for register-transfer-level (RTL) coding. Yet another team does the physical design. "The person at the top level needs to know the implications of the algorithmic choices they're making," said Brodersen. "If you don't put the decisions up at that level, you have to spread it over three or four design teams, and each time you move from one team to another you have to re-verify the design, because they're all using different design descriptions."
What's needed to make BWRC's unified flow work are tools that can automatically generate physical information from a system-level, rather than an RTL, description. Those tools aren't available commercially yet, so BWRC is working to develop high-level power and area estimators, Brodersen said.
But perhaps the most crucial need is a common database that multiple tools can access. "Right now we've got to do all these translations," Brodersen said. "We're using tools from everybody, which drives us nuts." Brodersen hopes that the new OpenAccess Community initiative will help out, and he said BWRC is looking at using the Cadence Genesis database, which OpenAccess has endorsed as an industry standard.
Unified design description
The chip-in-a-day flow starts with a single design description that incorporates four types of decisions. Functional decisions specify the input and output behavior of each block; signal decisions specify the physical signals; circuit decisions specify the transistors to implement in each block; and floor plan decisions identify physical locations.
The design flow assumes that each functional block corresponds to a hard macro that has been extracted and characterized for power, area and delay. Simulink supports primitives such as adders and multipliers, and BWRC has added larger blocks, such as Viterbi decoders and fast Fourier transforms. To specify function, users parameterize the blocks and wire them together in Simulink. Signals are identified using Simulink's Fixed-Point Blockset. Circuit decisions are encapsulated in parameterized macro generators. A separate floor-planning tool is used to specify physical locations. BWRC is developing a tool for automatic clock tree insertion.
Data path macros may be implemented with Module Compiler or tiled layout generators. The control portions are translated to VHDL, using a BWRC-developed tool, and synthesized with Synopsys' Design Compiler. The team then uses Cadence layout tools and Mentor Graphics Corp.'s Calibre verification suite to complete the physical design.
One key, Brodersen said, is avoiding RTL verification as much as possible. If the precharacterized blocks are adequately verified, he said, chip-level verification can be handled with a bit-true simulation in Simulink.
Brodersen said it will probably take six more months for BWRC to hone the chip-in-a-day flow, but after that, commercial adoption could follow "pretty quickly." He noted that the research is up for funding from the multi-university Marco organization.
Brodersen conceded that the automated BWRC flow won't yield the performance of a full-custom design flow. "I'm sure there would be a factor-of-two or --three difference in speed, but remember that you're gaining a factor of 100 when you go to this type of architecture," he said.
A detailed interview with Brodersen is available at www.EEdesign.com. The "publications" section of the BWRC Web site lists a 2001 Custom Integrated Circuits Conference paper detailing the chip-in-a-day methodology.