@Betajet: "The FPGAs I'm familiar with won't let you do this, because the global clocks don't have clock enables"
I should clarify that I'm not trying to endorse the general use of async logic over COTS FPGAs -- but I did it in the past ;-). The point is that FPGAs are a really affordable platform for learning asynchronous techniques, just as they are for learning conventional synchronous logic.
I truly believe that async methodologies are being to be icreasingly used in the future as a way of dealing with the issues associated to deep submicron nodes and new nanoscale technologies.
Now, let me point to a paper I wrote some years ago about the efficient implementation of asynchronous logic over COTS FPGAs. Here, I expose some interesting experimental results on the old and good Xilinx SpartanIII and VirtexIV. Maybe the most interesting ones are the speed reached for both data communication [Mega Data Items per second] and pausable clock generation [MHz] -- I needed to perform some live demos in order to convince some researchers these were true:
@jackOfManyTrades: "This GALS idea seems to be just using a similar approach on a chip that is tradionally used on a PCB"
This is a very clever intuition. In some way, today ICs resembles a big system which has been collapsed inside the chip package. For this reason, terms such as "System-on-Chip", "Sytem-in-Package" or "Network-on-Chip" are a common topic in state-of-the-art VLSI design -- and the thing promises to get even more interesting in the future ;-)
Jack wrote: Interesting - did you design all the ICs as well as the PCB? ...once signals left the IC, all bets were off.
I designed the contents of the FPGAs. Actually, as an IC designer you do have some control over the PCB since you provide a "data sheet" that specifies the external timing of the chip: combinational delays, clock to output delays, and setup/hold constraints. As the chip designer, you promise the PCB designer that if the timing constraints are met the IC will function properly -- no bets about it.
@Betajet: "I've always called those "gated clocks", and I used them decades ago with TTL"
A pausable clock is a different concept. As you point out, clock gating implies that you have a clock signal that is always running and waiting for being injected into the clock distribution network.
A pausable clock is based in locally generated clock bursts. This is, there is an specific digital circuitry that is in charge of generating "trigger" signals only when required. In this way, you don't have a "real clock", but a kind of "pulse" generator -- I understand that the "pausable clock" term may lead to some confussion, but I didn't invent it ;-)
@Paul: "Sadly, wave pipelining is a victim of fast clocks, process variation, and other modern factors."
This issues are common for every design in which you must do deterministic timing asumptions. This is one of the reasons because a well known approach to asynchronous design is considered a glitter alternative for the future: the delay insensitive logic.
This kind of circuits are "correct-by-design", as you don't need to make any extra assumption about process variation, timing, etc. In addition, this circuits automatically adapt their performance to environment variations, such as temperature derives or a swinging power supply voltage.
Plus, delay insensitive design can be used in flexible substrates. An interesting example is the 8 bits MCU that Seiko developed some years ago:
Garcia-Lasheras wrote: This is what is called a "pausable clock", and it supposes an advantage in power consumption as this is only triggering when the synchronous logic block has an actual work to do.
I've always called those "gated clocks", and I used them decades ago with TTL. It was easy to do in TTL, because you could make a clock tree from NAND gates instead of inverters and the NAND gates provided a good place to put in a clock enable. You could also use the output enable of a tri-state driver as a clock gate. You can do this as a fully synchronous design: when clocks do toggle, they do so at the same time.
The FPGAs I'm familiar with won't let you do this, because the global clocks don't have clock enables [that are visible to the designer]. So if you want to gate clocks, you have to do it through logic that adds skew and I suspect screws up your hold timing so that it's impractical.
Update: See Brian_D's comment below -- Xilinx does have this capability.