PALO ALTO, California -- Intel Corp. has detailed an experimental 90-nanometer custom processor that handles TCP/IP ingress processing at measured data rates up to 9.64 Gbits/second.
Presented last Tuesday (August 19, 2003) at the Hot Chips Symposium, the 460,000-transistor chip, which combines a fairly conventional processing data path with the unconventional use of content-addressable memories (CAMs) and control flow, illustrates the potential for fully programmable protocol processing in the 90-nm era but shows the lengths to which designers must go to seize the technology's benefits.
The processor, internally known as TIPP, is part of a much larger effort to study TCP/IP offload processing using software-programmable means. Intel's Microprocessor Research Labs in Hillsboro, Ore., is directing the effort. The program is not limited to specialized processors but is also looking at the problems of protecting operating system and application processing from TCP/IP processing on conventional processors.
Approaching 10-Gbit/s wire speed required a purpose-built TCP/IP processor. The Intel device is a tiny, 2.2 x 3.5-mm die fabricated in one of its more advanced 90-nm logic processes. The processor is intended to handle ingress processing, eventually as one core in a two- to three-times larger system-on-chip (SoC) that will handle the full TCP/IP termination problem.
Intel researcher Jianping Xu outlined the key issues in 10-Gbit/s wire-speed processing. The fundamental problem is time: To keep up with 10-Gbit Ethernet, the processor must cope with a packet every 67 nanoseconds. That puts the task beyond the reach of general-purpose computing. But Xu argued that the ability of a software-based solution to respond to changes in protocols made a processor-based approach worth trying.
Speed aside, TCP processing is known to be messy, requiring demultiplexing of protocol information, packet filtering and protocol processing. Not the least of the problems is that TCP packets are not guaranteed to arrive in proper order, so packets must be reordered on the fly.
The Intel team devised a specialized architecture comprising a pipelined execution unit with tightly coupled scratch pad RAM, two CAMs, input and output buffers, and a dedicated block for transmission control. A local-instruction ROM drives the processor, using a wide, 112-bit control word that is apparently stored fully decoded in the ROM. The architecture handles packet reordering by using the CAMs to access packets by sequence number, rather than by sorting.
In a separate paper delivered by Greg Regnier and others from Intel at the Hot Interconnects conference-also held at Stanford last week-researchers described the use of dual-processor Xeon systems in TCP/IP processing and the offloading of protocol processing. The paper compared the efficiency of a symmetric-multiprocessing approach, in which the Linux TCP/IP stack was handled as just another kernel task, with an asymmetric approach in which one Xeon processor was dedicated to TCP/IP and the other served as a pure Linux application processor. The comparison showed an advantage to offloading protocol processing, even onto conventional hardware.
But the implementation of the TIPP processor, rather than its architecture, was particularly instructive. Intel used many of the advanced design techniques it has discussed over the past year or two. And many of those techniques are aimed directly at known problems with 90-nm processes: power and leakage current.
Power consumption is a serious issue with the chip. Operating at the top of its curve, at 1.72 volts, the device must dissipate about 6.4 watts from a die of less than 8 mm2. Looking into the details of power dissipation makes the scale of the problem more apparent: at the voltage center, around 1.2 V, parts of the execution unit and instruction ROM must dissipate more than 250 W/cm2.
A substantial component of power consumption in 90-nm processes is leakage current. Leakage also makes some structures more challenging to design. In its campaign against the problem, the Intel team employed a dual-threshold-voltage process and adaptive body bias-a technique the company has discussed previously. The part also uses a novel register file design that can tolerate relatively high leakage levels.
Much of the design was synthesized, according to Xu, but the key high-speed paths-the execution core and instruction ROM-were custom designs. The approach also makes use of semidynamic flip-flops. Less-critical parts of the design run at lower frequencies.
The ability to use synthesis for much of the chip, focus both logic and circuit design resources on a few critical blocks and still achieve high throughput may be a key new skill for SoC design teams. But it remains to be seen how well the design style can be supported when the tools come from commercial EDA vendors.