San Jose, Calif. -- Startup Ambric Inc. will head to the Hot Chips conference at Stanford today to show a novel approach to designing multicore processors that it says could outperform today's best DSPs or FPGAs by a factor of 10 or more. The company has developed on-chip communications hardware and a programming model that someday could be applied to many markets but that initially will be used to create a more standard device for video processing.
Ambric (Beaverton, Ore.) is part of a wave of startups leveraging improvements in process technology to deliver promising architectures for multicore design.
"For years, we have had an FPGA model of a sea of gates that you could connect any way you wanted, but we need building blocks made of something bigger to really make the most efficient use of silicon," said Linley Gwennap, principal of The Linley Group (Mountain View, Calif.).
Ambric steps up by delivering a "bric" that includes eight streaming RISC processors and 13 kbytes of distributed RAM in and around the cores. The startup is using 45 of those brics to create a 130-nanometer prototype chip, called Kestrel, that at 333 MHz is claimed to outperform a Texas Instruments C641x DSP or Xilinx Virtex 4 by more than a factor of 10.
Kestrel will pack 360 simplified, 32-bit RISC processor cores and a total of 4.6 Mbits of on-chip RAM. It is aimed at video processing and will have a maximum power consumption of 10 watts. If Ambric can gain a beachhead in video, it hopes to roll products with the same underlying technology for other markets.
"We're not coming out of the chute competing head-on against FPGAs and ASICs," said Jay Eisenlohr, senior vice president of market and business development and a co-founder of Ambric.
"It's relatively easy to come up with radical new architectures that look promising," said Gwennap. "The common problem all these startups face is delivering an environment and tools that are easy for their customers to use."
Ambric appears to be adequately focused on that challenge.
"We think you need to define the right software programming model first and then develop the circuits that support it, and that's just what we did," said Mike Butts, an Ambric fellow and vice president of architecture.
Under the programming model, developers use a subset of Java to create a set of tightly defined objects in a fixed hierarchy, along with a messaging scheme to let them communicate. The Java subset removes virtual machines, garbage collection and floating-point support and adds class libraries that deal with Ambric's unique register hardware.
Developers must make sure objects have no undefined dependencies in order to enable the asynchronous, parallel execution style of the Ambric chip. The resulting software is mapped via Ambric tools to its chip hardware.
The secret sauce of the processor is Ambric's proprietary design for registers that communicate asynchronously and automatically. The register scheme essentially eliminates "central, global state machines to synchronize processes. [Such state machines are] hard to design, hard to validate and hard to scale," said Butts.
The approach also eliminates the need for validation of timing closure. As a result, "whole categories of bugs are no longer possible," Butts said.
Ambric creates on-chip communications "channels" based on groups of registers linked by separate data and control lines. The control lines automatically inform a register when it can pass data to its neighbor, eliminating the need for clock-based synchronization. Data flows through word-wide channels to registers that can buffer up to two words.
The startup uses its unique register scheme to build its processors and brics and the chip-level interconnect. A three-tier hierarchy of channels interconnects elements on a bric, on neighboring brics and on distant brics.
The resulting Kestrel chip, expected to be announced in the fall, should require just a third of the software of competing FPGAs and DSPs while delivering 10 to 50 times the performance at a similar price. The architecture could also be scaled to offer performance similar to that of competitors, but at a far lower cost, according to Ambric.
The initial chip is expected to handle 60 gigamultiply-accumulates/second. It will have a 425-Gbit/s bi-section bandwidth based on its use of multiple, distributed cores and memory banks that appear to software developers as a sea of CPUs and memory.
Butts claims Ambric's architecture breaks with traditional von Neumann-style processing to create a communications-centric scheme more efficient for today's applications, such as media streaming and high-speed inspection of data packets in networking.
Ambric was co-founded a little more than three years ago by Jay Eisenlohr and Anthony Mark Jones, who previously had worked together at graphics chip designer Rendition. "We saw how problematic the whole RTL process is, and Mark is a super-smart guy who had some novel ideas about how to handle it," Eisenlohr said.
The duo snagged $10.4 million in funding, led by ComVentures (Palo Alto, Calif.), and set about crafting an approach to leapfrog the multicore world with on-chip parallel processing.
On-chip networks have become "one of the hottest topics in interconnects right now, but . . . the real bottleneck is getting things in and out of these [multicore] chips," said Fabrizio Petrini, a senior researcher at the Pacific Northwest National Laboratory (Richland, Wash.) and co-chairman of the Hot Interconnects conference, also being held this week at Stanford. "They [Ambric] may be optimizing the wrong thing. There is not a big technical problem building high-performance on-chip networks. The major bottleneck is getting to memory."
Ambric's Butts said the highly distributed nature of processing and memory resources inside the Ambric chip means it does not have a huge hunger for off-chip memory.
"Memory access is not an issue for most apps we are looking at, including video," he said.
See related image