SAN JOSE, Calif. Reconfigurable processors may still be off the beaten path for most hardware engineers, but there's evidence they are making their way into large, massively parallel network processors with cores numbering in the hundreds. IBM Microelectronics is working with a foundry customer to combine 174 Tensilica processors on one device for a network processor that should tape out by September. And ARC Cores has engaged with an undisclosed customer that is building a network processor that packs 260 cores and should be ready this year.
Such designs show that, at least for communications, processors are becoming mere building blocks rather than an end unto themselves.
But while some laud the ability to build a customized processor, reconfigurable processors remain a hard sell to the vast majority of embedded hardware engineers who are either unfamiliar with reconfigurable designs or uncertain about abandoning the traditional processor development model. Foremost on their minds are questions about the trade-offs of adding gates vs. code optimization and verification and test.
Both the progress charted by reconfigurable processors and the challenges that confront them were spotlighted here this week at the IP/SoC conference.
"For some, it's scary because they've never done it before," said Jim Turley, senior vice president of technology at reconfigurable-processor vendor ARC Cores. "For others, it's completely liberating."
A contingent of engineers at IBM Microelectronics has been preaching the virtues of reconfigurable processors for some time, said hardware engineer Riyon Harding, who works at IBM in Burlington, Vt.
Harding is working on the network processor that incorporates 174 Tensilica devices. IBM is working to get each processor down to 92,000 gates so that the processors can all fit on one 18-mm x 18-mm piece of silicon based on IBM's CU-11 process technology, its most advanced process, she said.
In building a parallel processor to move packets, "Optimizing the op code in the instruction set has to be key in getting an edge over a competitor," she said.
The ARC-based design uses 256 processors to route packets and four cores to manage the on-chip traffic. The customer found that "the way to higher bandwidth is not higher clock speed that burns energy but to do it all in parallel and broadside the data by throwing in processors and having the processors do a little of the work," Turley said.
Reconfigurable-processor vendors claim they're making headway in networking and consumer electronics, because most general-purpose processors are designed for compute-intensive applications. In fact, Turley, who claims half of his customers are in networking, said there's no plan to add a floating-point unit to the architecture, because there's little demand for it from customers. He's even been surprised to learn that some are going so far as to strip out all math functions. "They don't even do add and subtract; never mind floating-point," he said.
Designers building communications-based processors are more interested in features like bit manipulation, which in a general-purpose processor requires extensive code manipulation to solve. Bit manipulation is becoming a requirement for building quality-of-service networks, where an individual bit could mean the difference between sending a packet to the chief executive and sending it to the mail room, observers said.
Richard Musacchio, principal engineer for consulting firm Paradigm Works (Andover, Mass.), said the ability to write special instructions for bit manipulation is one of the most attractive features of reconfigurable processors. "In an 8-bit register, for example, if you want to read 1 bit to see if it got sent, you normally have to do some shifting or masking," Musacchio said. "With this, you could create an instruction and do it in one cycle, as opposed to 10 cycles."
Musacchio was part of a small but vocal group of engineers at the IP/SoC conference who traipsed between the tutorials held by ARC and Tensilica, peppering presenters with questions on the trade-offs of moving from a general-purpose CPU to a reconfigurable processor.
Vendors have made strides convincing some top-tier semiconductor manufacturers and other OEMs to test the waters. But the market for embedded systems is deep and wide, and it's taking time to win over the cautious and skeptical. "We grew up with the mentality that changing the processor was out of the question," said ARC's Turley.
Instead of treating the processor as a black box and assigning an army of software engineers to optimize the code in assembly, reconfigurable-processor vendors are asking engineers to add new instructions and build up the hardware. "Our solution is to throw more gates at the problem for much better performance," said Leo Petropoulos, director of customer support for Tensilica Inc. (Santa Clara, Calif.).
Fiddling with caches, for example, is one well-known way to eke out better performance, though many hardware engineers would find the idea of messing with caches frightening. "I personally would be pretty wary of a configurable cache," IBM's Harding said.
But cache configuration has been turned into a pushbutton exercise in reconfigurable processors, and vendors are encouraging engineers to try matching the memory to computational bandwidth. "When you look at extending the processor's horsepower, it isn't going to do you any good if you don't put down some traction," Petropoulos asserted during a processor configuration demo.
And the results appear to be favorable. After running simulation tests on an optimized Tensilica core, the EDN Embedded Microprocessor Benchmark Consortium (EEMBC) lab found that the Tensilica Extensa core outperformed Texas Instruments' C6203 on a telecommunications benchmark. The test results were obtained by extrapolating a 1-MHz core to run at 200 MHz, with the bus speed running at half the core speed. The TI processor was tested at 300 MHz. The new instructions and architectural changes made to the Tensilica core were screened by the organization's lab to "make sure they were legitimate," said EEMBC president Markus Levy.
Levy said he had been skeptical about reconfigurable processors but has since changed his mind. "If ultimate performance is what you're after, then this is the way to go," he said.
But that doesn't tell the whole story, according to TI. For one, the Tensilica numbers are based on simulation. Further, using a consolidated performance benchmark an idea resisted within EEMBC itself before its eventual adoption may give greater weight to some performance numbers over others. And when you examine the metrics individually, the C6203 still outperforms Tensilica's core on some functions, such as fast Fourier transforms, said Andrew Soukup, DSP strategic marketing manager at TI (Houston).
What's more, a customer can buy a DSP off the shelf, and the optimized code can be downloaded from TI's Web site. TI continues to profile new applications that can use optimization, such as 8-bit multiply-accumulate or Galois filed multiplication, used in Reed-Solomon error correction. The company is also doing more in C compilers, which at their best provide 100 percent performance for certain functions and 40 percent worst-case.
"How many people want to design their own processor, versus taking one that's available and writing the C code to use it to do what you want?" Soukup said.
But ARC and Tensilica say they are not standing still when it comes to C compiler support. The process of adding instructions is already based on C intrinsics, and both say they have R&D projects under way to improve the compilers. "The general direction we're going is to tell the compiler what the instruction is and have the compiler recognize it," said James Hakewill, ARC's chief architect.
Though building a reconfigurable processor is straightforward, it's not as simple as buying an off-the-shelf part. "With TI, you can go to them and buy a part and have it tomorrow," Levy said.
But for those willing to put the extra effort to home in on the "hot spots" of a processor and create new instructions that translate into extra gates, the exercise could be worth it, Levy said. Companies like TI stress building co-processors for certain functions, but from a performance standpoint, adding instruction extensions is superior to building in a coprocessor, which is a decoupled piece of logic. Levy said co-processors also need separate debugger tools, though Soukup said TI's debuggers are already part of its Code Composer tools.
When it comes to device frequency, standard processors still have an edge over configurable processors . At 0.18-micron, for example, Tensilica's core runs at 320 MHz typical, 200 MHz worst-case. TI's C6x device peaks at 800 MHz and is manufactured on an optimized process technology. So while the clock-to-clock efficiency is not as good running a standard part, it could balance out in the end if the clock speed is higher than for the reconfigurable part, Levy said.
Another concern is cycle-accurate timing. Musacchio said he would like to get the source code so he could run his own regression suites. Turley said ARC gives its customers source code.
Engineers like Musacchio think reconfigurable processors merit investigation, but they're not quite ready to commit. "It looks good on paper," Musacchio said. "But at this point we're still comparing apples to oranges. It's hard for me to see right now if this is going anywhere or not."