@David: "True, the transistors you can fit in one layer are getting limited, but we're still putting more transistors on a chip by putting in more layers"
Yes, this is true. But this increases the issues related to clock distribution problems too!!
In state-of-the-art digital ICs, as much as the 50% of power consumption is drawn by the clock distribution circuitry in order to assure that the signal will be "skew-free" across the whole layout. This is because we need to include lots of "H" forks and signal buffers for getting a clear signal.
When we jump to a extra 3rd dimension, this problem gets worse, as illustrated in the next image -- shared from the Ecole Polythecnique Federale de Lausanne (link):
Interesting - did you design all the ICs as well as the PCB?
I'm coming at this as an ex-digital IC designer (65nm was considered sexy when I stopped). I was designing standard components, which customers could take and use as they wished. From that point of view, once signals left the IC, all bets were off: not only did I not have any control over the customer's PCB layout, I didn't know what other ICs they might be using.
Jack wrote: The chips on a PCB will probably all be synchronous designs, but they certainly won't all be sharing the same clock tree!
Actually, they may be sharing a single PLL-based clock generator that derives all the clocks from a single reference. In that case they are sharing the same clock tree, with the trunks implemented on the PCB and the branches inside different ICs. For example, one of my designs has a 33.333 MHz master clock with matched clock lines so that the clock arrives at the various ICs and I/O modules at the same time. Very clean, plenty of timing margin.
@Betajet: "Aren't you going to have metastability problems with multiple clock islands?"
Metastability is a real issue in all asynchronous logic designs, just as it is in synchronous designs dealing with asynchronous external inputs.
About the synchronous islands, let me clarify that having such designs blocks doesn't always mean you are using a conventional periodic clock. The synchronous island implies that you have a local clock distribution network that acts as an "isochronic fork". This is, by limiting the clock network to a local boundary, you can assure that you can insert a clock/trigger signal in the clock network input and this signal will reach the local registers/flip-flops with a controlled skew inside the synchronous island. By this way, you can use conventional synchronous EDA tools in order to design, synthesize and lay down the digital logic inside the island.
But the point is that the "clock" you are injecting into such a local synchronous island may be a locally generated signal, being aligned by this way with the asynchronous handshaking control circuitry. This is what is called a "pausable clock", and it supposes an advantage in power consumption as this is only tiriggering when the synchronous logic block has an actual work to do.
This GALS idea seems to be just using a similar approach on a chip that is tradionally used on a PCB. The chips on a PCB will probably all be synchronous designs, but they certainly won't all be sharing the same clock tree!
Asynchronous design is interesting at various scales, but I also find wave pipelining interesting (which is sort of related to timing-dependent asynchronous design). It seems neat that one can avoid a pipeline latch by timing more pulse-like signals (waves) such that they do not overlap. Sadly, wave pipelining is a victim of fast clocks, process variation, and other modern factors.
Most likely 3d chips will buy us 1-2 or maybe 3 generations of moore's law(both in density and partially in price). After that prices won't go down(according to zvi orbach from monolitic 3d), and it would be hard to put more layers due to thermal limits.
@betajet I must admit that occurred to me as well. Back in the day I used to deal with strings of modems used to poll terminals. You HAD to have one clock through the whole system or you'd get errors. If anything puts data out faster than something else can accept it, you'll start losing it. You can use buffers to take up phase differences and slight clock slippage but eventually the buffer will over- or under-flow.
If your buffer is big enough and you're only sending bursts of data you can get away with differences, but we're talking big amounts of data here I think?
tpfj wrote: "Moore himself wrote only about the density of components (or transistors)".
Based on that, and the impending proliferation of 3D ICs, are we not extending Moore's law vertically, so to speak? True, the transistors you can fit in one layer are getting limited, but we're still putting more transistors on a chip by putting in more layers.
Max - you're an expert on this, what do you think?