Thanks for this timely article. Desipte the many problems folks believe they or others will have with complex asynchronous design, our group has been successfully developing and producing asynchronous multicomputer chips for years. Our latest places 144 computers on a die of 21.3 square millimeters and packaged in a 1x1cm QFN88, using a garden variety 180nm process. Actually Mr. Laurence's point was *not* that the cores need necessarily have low performance; each of the 144 computers on our GA144 chip is capable of 666 MIPS, for example. For further information on the subjet of energy-efficient high-performance asynchronous multicomputer chips that have actually been built and actually work, I invite you to visit our website http://www.greenarraychips.com/documents
Agreed, paradigm shifts do need new tools. All three, performance, cost and power are indeed achievable though, through the use of truly asynchronous self clocking design. Check out the microprocessor arrays at: http://www.greenarraychips.com/home/products/index.html
I enjoyed the article. But the major thrust of the argument presented is that complex and slow logic cones are advantageous over sequences of simple and fast logic cones. This may be valid, but it is not necessarily an argument for clockless/self-clocked design style. The same approach can be applied using synchronous design style, with most of the same claimed benefits.
The biggest argument against the use of clockless logic in today's world is the lack of good tool chain support. CMOS is the dominant technology, and CMOS is subject to extremely high jitter. Lacking solid tool support for guaranteeing minimum delays, or at least guaranteeing delay relationships to assure race outcomes, self-clocked techniques are not reliable. This leaves fully delay insensitive techniques, which increase wiring density and worst, increase dynamic power consumption.
In the past few years logic synthesis tools have finally evolved to be physically aware. But until they begin to embrace something more than the bounded delay timing model, with P&R tools honoring the constraints they annotate to the net list, practical application of asynchronous design styles will continue to be difficult and limited.
(antiquus: see http://www.achronix.com/)
Given the current dependence on RTL methodology in EDA, I'd say asynchronous design is still a specialist job. A friend at NXP said they had to give up on it because of testing overhead and unexpected issues in practice.
A first step is to move to asynchronous descriptions (above RTL). SystemC Transaction-Level-Modeling is a step in that direction, but has the problem that it (SystemC) is rooted in old cycle-based/shared-memory simulation methodology.
My own efforts to add asynchronous support to SystemVerilog got derailed, so I decided to do my own HDL (an extended C++) to solve the problem - http://parallel.cc - it turns out to be good for parallel programming too :-)
IMO we'll see GALS (globally asynchronous, locally synchronous) design before a wholesale move to asynchronous design, but because of the characteristics of "high sigma" Silicon the move is inevitable.
I too go back to the TTL days and understand the physics of the circuits, but there were also things called delay lines(replaced single-shots) that were used to generate known pulse widths or delays. After reading the article I looked into handshaking with request/ready type functions. Everything looked okay until I thought about interfacing with a ram which has an access time not related to handshaking. I am stumped about how to interface with memory.
Any info would help, thank you. Karl
Thank you for reintroducing this concept. It brings back memories of my first engineering job where the numbers 7400, 7404 and 7474 had tangible meaning (and later with the adjectives H, LS and HC). Some time later in my career I moved to an FPGA-based design house, and had to relearn digital logic design in the synchronous world.
I must say, however, that your list omits the greatest hurdle to a rebirth of asynchronous methods: the technology of the Xilinx, Altera, and Actel triad, who are 20 years entrenched into synchronous logic. The few times I tried to discuss asynchronous issues with the rep's, I was always guided away with the admonition that the LUTs are not guaranteed to be glitch-free. That is, assuming that the rep even understood the question, I was talked down from the asynchronous cliff. The registers and clock trees are readily understood by the youngest apprentice, but the logic element itself remains a black box. To even broach the topic carries a too-high risk for some of my most experienced colleagues, with reasons like "I don't think they characterize it like that". I was only the project leader, what did I know?
The efficient configuration of the and/or decision elements will prove to be the crux of any upcoming revolution.
Regarding test, as far as I understand, asynchronous design still has to use storage elements like flops and latches. These could be used also for test purposes . In testmode these could function in the well known synchronous mode.
My question is how the problem of unwanted glitches on the clock input of the storage elements, is solved?
This is actually the reason why synchronous design is used today.
This seems very exciting but a question pops up into my mind almost immediately. What about test?
After manufacture the chip needs to be screened for defects. On the synchronous approaches this is almost straight forward, all those pipe line registers serve as extra controllability/observability points, and a big logic cloud gets broken into smaller ones easier to test. A single, really big logic cloud really scares me a bit... Test wise I mean. Is functional test the only approach possible? If so, how does the higher test cost, and lower coverage (probably more faulty chips reaching the end costumer) affect the total cost equation? Is this already taken into account in the above article?