If papers at the recent IEEE ASIC conference are any indication, there is a significant split between what industry is doing and the ponderings of the research community when it comes to on-chip interconnect architectures. While IP providers are still offering things that look, at least at the logical level, like buses, some block interconnect schemes being investigated at universities have left the bus far behind.
The comparison in point is between three papers in a session on on-chip interconnect. In one, IBM provided details of the latest version of the processor local bus, an important part of the overall Coreconnect architecture that binds IBM's intellectual property to the PowerPC. Although IBM is clear about the need to move to switched architectures farther down the road, the latest version of the processor local bus (PLB)-version 5-is very much the traditional microprocessor local bus. What's changed about it is that-with the emergence of on-chip multiprocessing-the on-chip PLB has come to resemble the local buses in multiprocessor servers. It has sprouted features such as hardware support for the MESI cache coherency protocol, which would be unnecessary in an SoC with a single CPU.
But researchers in academia are, as one might expect, thinking further ahead. A paper from the University of Turku (Finland) suggested a segmented bus in which the independent segments meet at bidirectional tri-state interconnections. The segments are selected to handle the largest data flows locally, with only occasional traffic requesting a bridge from one segment to another. Those requests go to a global arbitration block. The latter, interestingly, is handcrafted using handshaking self-timed logic, which solves a number of the thorny timing problems of arbitration between segments that are not synchronized to each other.
Yet another approach, this one from the University of Paderborn (Germany) is topologically similar, though it differs significantly in detail. The authors proposed an SoC made up of clusters of functional blocks. Each cluster would have a local synchronous bus. The buses would terminate in switch boxes that are arranged in whatever topology the system architecture finds best. Anything from a ring to a butterfly network might serve, depending on the data flows.
The interesting notion here is that arbitration is distributed. Each switch box is packet-oriented, with packet buffers, a forwarding lookup table and significant processing power of its own to execute routing algorithms. All that is a lot of overhead-until you get to 80-nm processes. Then what sounds like overkill starts looking like a solution.