The Globally Asynchronous, Locally Synchronous (GALS) technique can be used to connect multiple IP cores on a single deep-submicron digital IC (e.g., ASIC, SoC, FPGA).
Everyone in the electronics world knows about Moore's Law. From its inception, this simple mathematical relationship has captured how CMOS chips become faster and faster as they dive deeper and deeper in terms of process nodes. The physical mechanism driving this law has been quite easy to understand.
Editor's Note: This article first appeared on All Programmable Planet (APP), which was a thriving community website devoted to all things programmable. Sadly, APP is no longer with us, but many friendships were forged there that will last for years to come.
We can consider digital integrated circuits,(e.g., ASICs, SoCs, and FPGAs) as being composed of two different kinds of building blocks. On the one hand, we have logic and storage elements, which are built up from transistors and perform real information processing. On the other hand, we have the routing resources, which are mostly composed of metal wires (and vias) and which carry the digital information around.
For decades, transistors have dominated costs associated with integrated circuits. We are not talking only about pure "monetary price" -- this also applies to delay, power consumption, and layout area (silicon real estate). As fabrication processes moved to deeper process nodes, transistors became faster and smaller and "performance boosted" in every dimension. In the case of a complex design, for example, by simply halving the size of every individual transistor in an otherwise unchanged gate-level implementation, you were able to roughly double the speed while reducing silicon area and power consumption.
As long as the asymmetry between logic and routing allowed dismissing signal propagation effects, the situation fitted well with Moore's predicted trend. But this idyllic story began to change when the physical properties of routing (e.g., the complex impedance characteristics of nano-sized-width wires) started to manifest themselves over those of pure logic transistors. Over time, routing resources progressively became the dominant cost-driver, with the understanding once again that "cost" in this context embraces propagation delays, dissipated power, and occupied area.
In this situation, irrespective of how fast you can make your transistors toggle, if the wires carrying the signals bound the minimal delay, then there is a clock frequency point that simply cannot be surpassed in a secure way using conventional techniques. Of course, this is not only due to delays in the datapath, but also to delays in the clock distribution network. Designers now have to work very hard to control effects like jitter and skew in the clock tree.
In order to help us visualize this situation in which a physical speed limit in devices fabricated using CMOS technology appears to have been reached, Dr. Colin Gillespie from the University of Newcastle has kindly shared the information he obtained when studying the performance evolution of Intel CPUs with us here at EETimes. From Gillespie's analysis, we can see how Moore's law of exponential growth finally collapsed near the year 2004 when applied to CPU clock frequency. An asymptotic limit to clock speed seems to have been reached, and this limit is closely related to the maximum speed with which signals are able to travel across the die as illustrated below:
The point is that "there is plenty of room at the bottom" as transistors shrink from one process node to the next, thereby allowing more of them to be packed into the same area on the silicon. When applied to general-purpose CPUs, this means that you cannot build a faster clocked core by making it smaller -- in this context, increasing complexity in a single IP-Core seems to have reached the technological limit. However, you can create multiple instances of the post-placed-and-routed design, including the clock distribution network, in the same die, thereby multiplying the total available computing power as illustrated:
This paradigm change has led to the birth of increasingly complex SoC (system-on-chip) devices that integrate multiple CPUs, graphics functions, communications cores, and even programmable logic in the same package.
To Page 2 >