DENVER A new breed of communications processor player is arising, bearing scant resemblance to the packet-parsing startups that made "network processor" a household term before being swallowed whole by the semiconductor giants. Chameleon Systems Inc., Improv Systems Inc. and Malleable Technologies Inc. are among the new companies taking cues from the DSP, FPGA and VLIW communities as they turn out architectures tuned for 3G cellular phones and packetized voice.
This week, Chameleon (Sunnyvale, Calif.) will introduce its Reconfigurable Communication Processor, based on tiles of replicated 32-bit data path units that can be extended to support 50 full-duplex channels of simultaneous cdma2000 chip-rate processing. At the same time, Improv (Beverly, Mass.) will extend its Jazz PSA development environment for very long instruction word (VLIW) processors to a specific family of devices for voice-over-Internet Protocol, offering both a voice packetizer chip and an echo canceller chip.
Chameleon and Improv will face reconfigurable processor companies with more hard-coded or firm-coded blocks designed for vertical markets, notably Malleable Technologies (in voice-over-IP applications) and MorphICs Inc. and Quicksilver Inc. (in wireless apps).
Even the packet-processing crew expects the newcomers to have a revolutionary impact on the industry. At the recent Network Processor Summit in Las Vegas, Mike Hathway, chief technology officer of Lucent Microelectronics' Agere network processor group, said the new architectures may largely replace programmable DSPs rather than packet parsers. Nonetheless, he said, they represent integration opportunities in basestations and central office switches.
Will Strauss, president and principal analyst at Forward Concepts Inc. (Tempe, Ariz.), thinks the startups all could find design wins in multichannel gateways and basestations, where they could replace FPGAs that handle offloaded signal-processing functions, such as turbo coding, not handled by a primary DSP. But the newcomers could face the same sort of viability problems that confronted both the packet-parsing startups and an earlier generation of DSP specialists.
"There is the same hurdle that must be crossed that ZSP ran into," Strauss said. "The unit volumes of processors in some applications they target are so high that a large company might not be willing to entrust the business to a startup, unless it had some very impressive partners or backing."
Steep scaling
Chameleon president and chief executive Chuck Fox said one potential customer, a telecom equipment supplier for wireless networks, has addressed the issue of how Chameleon's architecture will compare with general-purpose DSPs. The TMS320C6X DSP family, from Texas Instruments Inc., scales slightly faster than Moore's Law. But the customer found that the datapath blocks on the Chameleon processor can scale exponentially for wireless channel processing support, providing "a growth path much steeper than Moore's Law, which is what the customer needed," Fox said.
Chameleon is a licensee of the ARC processor, which it uses as an executive controller overseeing the 128-bit internal RoadRunner split-transaction bus. The heart of the Chameleon system, however, is an extensible 32-bit programming fabric that uses RoadRunner to communicate with the ARC core, as well as with external devices on the PCI bus.
In the first Chameleon chip to hit the market, the CS2112, the device's fabric is made up of four programming "slices," each of which has three "tiles." Inside each tile is a control logic unit (a programmable logic array that controls the tile's registers); four separate local store memories measuring 32 bits x 128 entries deep; dual 16 x 24 multipliers; and a total of seven 32-bit datapath units, similar in function to an arithmetic logic unit. One CS2112 chip thus has 84 datapath units, 24 multipliers and 48 local store memories, with an aggregate memory of 24 kbytes.
Bruce Kleinman, senior director of marketing at Chameleon, said the strength of such distributed processing shows in the ability to handle digital processing algorithms. The CS2112 can handle a 1,024-point fast Fourier transform in 10 microseconds, while a 48-tap symmetric FIR filter has a 125-Msample/second capability. As baseband processing in 3G digital cellular markets branches out to include some IF filtering functions, Kleinman said, even multichannel DSPs begin to fall down.
"DSPs may be fine for implementing vocoders, but if you're taking data right off the antenna, you need gigabytes of processing power, and that means a non-traditional processor architecture," Kleinman said.
Chameleon also must distinguish itself from VLIW architectures, particularly since Improv has spent close to two years showing telecom OEMs how well a shared-memory, multiple-instruction multiple-data (MIMD) VLIW architecture maps to parallel channelization. Kleinman said the Chameleon datapath units operate not in VLIW or heavily pipelined fashion, but on the individual instruction or bit-stream level.
Reconfigurable processors
Each tile has a background configuration plane and active configuration plane. Instructions for the architecture are dynamically programmable, since instructions load in the background plane during one clock cycle, then swap functions with an active plane. Chameleon calls the effect eConfigurable, since processors can be reconfigured in a handful of nanoseconds rather than the tens to hundreds of nanoseconds required in first-generation reconfigurable architectures.
That means multivariate problems are handled in an all-or-nothing fashion in the Chameleon design. In a cdma2000 chip-rate application, for example, tasks such as pseudrandom number generation and rake finger searches are not parceled out to various pipelined subprocessors. Instead, an entire tile is dedicated first to pseudorandoms, then to demodulation, then to finger searches and finally to access searches, with the task reassigned in a single clock cycle.
Applications are developed in a mixed environment, where routines compiled in C are developed for the ARC executive controller, while Verilog source code is created and synthesized for the reconfigurable fabric. The resulting fabric bit stream is sent to the ARC linker in the final object-code steps and then to Chameleon's proprietary execution engine for final simulation.
The resultant reduction in power dissipation and board size can be profound, such as the case of a 512-channel baseband modem that requires 15 reconfigurable processors to perform work that would require 75 general-purpose DSPs.
Improv, of course, is not sitting still for the arrival of new communication contenders. It's offering the Jazz 16-726 voice processor, for handling 16 channels of G.726 vocoding, as well as the Jazz 16-168 echo processor, for dealing with echo cancellation in voice-processing environments.
Because such semiconductor players as Philips and STMicroelectronics are licensing the Jazz design environment to come up with communications-specific processors, Improv has to be careful as it fields its own optimized standard products for telecom OEMs. Carey Ussery, chief executive of Improv, said the company will likely target different levels of customers.
In the central office and voice gateway, OEMs are looking at larger, more-complex platforms in which port density doubles every few months. There, Improv is likely to work directly with OEMs, Ussery said.
In the latter market, the need for flexibility in VoIP systems seemed particularly appropriate for a retargetable VLIW architecture, Ussery said. OEMs have several voice codecs to choose from, and there's little agreement on gateway protocols; systems will likely use a mix of H.323, Session Initiation Protocol and Megaco for many quarters, if not years, to come.
In that market, Improv faces Malleable (San Jose, Calif.), which has roots in MicroUnity Design Inc. The Malleable Embedded Communications Accelerator (Meca) rolled in mid-April. Malleable separates ALU-like blocks for DSP functions from reconfigurable blocks, which are integer logic elements similar to the cores seen in CPML programmable logic.
Malleable chief executive Curtis Abbott said hardwired blocks for specific communication functions play a bigger role in Meca than in competitors' offerings. Dedicated cores are provided, for example, for a time-division time-slot indicator, an ATM segmenter/reassembler and an Internet Protocol packet prioritizer.