At one extreme, where transforms can be expressed as vector arithmetic with a minimum of control complexity, high-end chips almost become arrays of media-access controllers. Telairity Semiconductor Inc. (Santa Clara, Calif.), for example, last week presented the Telairity-1 chip, intended for H.264 encoding. It has a total of 220 sixteen-bit vector function units and 30 scalar function units on the die. Cradle Technologies Inc. (Sunnyvale, Calif.) described the CT3616, with 16 DSP cores and eight general-purpose cores on the die.
As control complexity increases and the relative importance of simple computational loops wanes, the chips begin to look more like conventional CPUs with DSP extensions. Tensilica Inc. (Santa Clara), for example, demonstrated that a wide range of audio-processing applications could be handled very economically by adding specialized signal-processing instructions in this case, about 300 different operations to the vanilla Xtensa RISC core.
If the parallelism exists in the algorithm rather than the data, often the resulting chip will take on a pipelined appearance. One such example was a novel processor from Intel Corp. intended for efficiently handling the huge streams of very small packets "Milliflows," in Intel's parlance associated with voice-over-Internet Protocol and similar applications. Intel's Magpie architecture aims to aggregate thousands of these small packets, route them into a DSP farm for processing and then deliver them in a useful manner.
To do this, the chip employs three extended MIPS cores one each for packet ingress, traffic management and packet egress along with two unenhanced MIPS cores for managing the network stack and scheduling, respectively. In this way, functions that are logically separable occur on different processors; incoming packets see, in effect, a pipeline of functional units.