As ICs approach system-level integration, chip design becomes system design. Growing evidence of this trend is the pursuit of system bandwidth. When a chip was just one functional block in a larger system, chip designers could address bandwidth questions almost entirely in terms of maximum clock frequency. But today one chip may contain all the important data and control flows in a system, and chip designers are being introduced to the system architect's pastime of system-level bandwidth management.
In the case of system-level FPGAs, that means designers have to look beyond the speed of the I/O pins. They must understand where all that data is going, what will happen to it and how it will get out again. A case in point is Altera's recently announced Apex-II family, aimed at data path applications in network routing. By permitting concurrent operation of several 1-Gbit/s LVDS channels, even more 622-Mbit/s pseudo-LVDS channels and conventional I/O, the chips are capable of swallowing and disgorging pretty substantial amounts of data. The problem lies in what happens after the data gets through the transceivers and onto the die.
The obvious answer is that such fast serial data gets deserialized to slow down the necessary clock rate. In the case of the 1-Gbit/s channels, this requires dedicated hardware-the Apex's programmable interconnect and logic cells are not fast enough to reliably implement the necessary 1-GHz shift register. The chip can, remarkably, implement the serializer/deserializer circuits for the 622-Mbit channels in programmable logic, and may be able to do 840 MHz.
But even after this substantial rate reduction, you still have multiple data streams roaring about at over 100 MHz. And since the Apex-II devices currently do not include on-chip CPUs, some of these data streams are going to have to leave the chip for external memory and come back again. So the family requires not only fast differential I/O, but also zero-turnaround, dual- and quad-rate memory bus interface capability.
Much more difficult to analyze is the need for interconnect bandwidth within the logic/memory fabric of the device itself. Some field extracting, sorting and queuing of ATM cells or Internet Protocol frames will have to be done in hardware. Data flows will have to be moved among the deserializers, memory interfaces and serializers. The interconnect will have to be rich enough, and power dissipation controlled enough, to make this possible.
We are likely to see FPGAs becoming much more application-specific as such considerations stretch the limits of programmable-logic technology.