Design Article
The efficient implementation of asynchronous logic in COTS FPGAs
Clive Maxfield
1/4/2013 4:01 PM EST
Now, here's an interesting one... but before we start, I'd first like to "set the scene," because it may be that some of our younger members aren’t 100% aware of what we mean by "synchronous" versus "asynchronous" when it comes to logic circuit design.
The way in which I visualize this is as shown in the diagrams below. Let's start with a synchronous circuit in which we have "chunks" of combinatorial logic separated by blocks of registers. The reason this is referred to as a synchronous circuit is that everything is synchronized to a common clock.

Synchronous circuit
One big advantage of the synchronous approach is that it's well understood and the vast majority of our design and verification tools are tailored to this way of doing things. The main disadvantage is that we have to design for a worse-case scenario. In its most simplistic form, this means that we calculate the maximum delays (due to worse-case temperature, voltage, and process variables) and then set the maximum clock frequency accordingly. The problem here is that when the device isn’t working under worse-case conditions, we end up "leaving performance on the table."
The alternative is to create an asynchronous, or self-timed, circuit as illustrated in the image below. The idea here is that we get rid of all of the registers and there is no clock. Instead, each "chunk" of combinatorial logic uses control signals to communicate ("handshake") with its adjacent counterparts to inform the "upstream" logic when it is ready to accept new data and to inform the "downstream" logic when it has new data to pass on.

Asynchronous circuit
There are several potential advantages associated with this asynchronous approach, not the least that we no longer have a clock, which can be a routing nightmare and consume an inordinate amount of power in a synchronous implementation. In addition to making things easier to route and consuming less area and power (we no longer have any registers), the asynchronous circuit will always run at its maximum possible speed. Also, if there isn’t any data, then the asynchronous circuit will happily sit there waiting, unlike its synchronous counterpart, which (unless we do something about it) will keep on clocking away burning power, even if it's not doing anything useful.
The big downside to asynchronous design is that there are few (if any) design and verification tools targeted at this domain.
And why am I waffling on about this here? Well, I just received a rather interesting email as follows:
Personally, I have a deep interest in the concept of asynchronous design, so I intend to look into this in more depth. In the meantime, I can certainly help to spread the word … hence this column (grin)!
If you found this article to be of interest, visit Programmable Logic Designline where – in addition to my Max's Cool Beans blogs – you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
The way in which I visualize this is as shown in the diagrams below. Let's start with a synchronous circuit in which we have "chunks" of combinatorial logic separated by blocks of registers. The reason this is referred to as a synchronous circuit is that everything is synchronized to a common clock.

Synchronous circuit
One big advantage of the synchronous approach is that it's well understood and the vast majority of our design and verification tools are tailored to this way of doing things. The main disadvantage is that we have to design for a worse-case scenario. In its most simplistic form, this means that we calculate the maximum delays (due to worse-case temperature, voltage, and process variables) and then set the maximum clock frequency accordingly. The problem here is that when the device isn’t working under worse-case conditions, we end up "leaving performance on the table."
The alternative is to create an asynchronous, or self-timed, circuit as illustrated in the image below. The idea here is that we get rid of all of the registers and there is no clock. Instead, each "chunk" of combinatorial logic uses control signals to communicate ("handshake") with its adjacent counterparts to inform the "upstream" logic when it is ready to accept new data and to inform the "downstream" logic when it has new data to pass on.

Asynchronous circuit
There are several potential advantages associated with this asynchronous approach, not the least that we no longer have a clock, which can be a routing nightmare and consume an inordinate amount of power in a synchronous implementation. In addition to making things easier to route and consuming less area and power (we no longer have any registers), the asynchronous circuit will always run at its maximum possible speed. Also, if there isn’t any data, then the asynchronous circuit will happily sit there waiting, unlike its synchronous counterpart, which (unless we do something about it) will keep on clocking away burning power, even if it's not doing anything useful.
The big downside to asynchronous design is that there are few (if any) design and verification tools targeted at this domain.
And why am I waffling on about this here? Well, I just received a rather interesting email as follows:
Hi Max, this is Javier D. Garcia-Lasheras (or just Javi, it's much more easier!!!). The purpose of this mail is introducing you an open source project that I think you may be interested in. The project is related to the efficient implementation of asynchronous logic circuits over COTS FPGA devices.
In 2001, I started a research in the topic of implementing asynchronous circuits in custom IC designs. The huge costs associated to ASIC design forced me to start prototyping the systems over commercial available FPGAs... and this issue showed to be a great deal !!.
The designed FF + LUT-based asynchronous circuitry (strongly inspired by Ivan Sutherland's micro-pipeline concept) performed so good in terms of speed, power consumption, EMI & logic efficiency, that soon after (Q4 2005) different institutions and venture capital became interested in launching a start-up focused in asynchronous IP-Cores & System Designs for FPGAs: the AsyncArt project.
Unfortunately, in 2008 the economical crysis arrived to Spain, my country (sorry for my "fast" English), and the project was aborted due to the interruption of economic support. Other coetaneous asynchronous logic based start-ups suffered a similar fate (even Silistix, started by Steve Furber & supported by Intel, broke).
But in the last months, I've seen some interesting movements related to the asynchronous-logic based companies that managed to survive. These movements suggest me that asynchronous logic is going to be (quietly) relevant in the near future.
* First: Fulcrum Microsystem, which was using asynchronous logic for building its High-Performanece network switch chips, was absorbed by Intel.
* Second: Achronix, the asynchronous FPGA company, became the first company in using Intel foundries for building its own devices. Soon after, we discovered that Achronix is going to license its designs as IP-Cores for custom ICs... it's clear who is going to be Achronix' first client !!!.
I suppose that Intel is going to use (programmable & fixed) asynchronous logic in processors committed to server / data center appliances. In fact, SUN Microsystems was using this kind of asynchronous design in at least two SPARC generations before Oracle absorption.
More than this, I strongly believe that asynchronous logic is going to be mainstream in a few years for two reasons: the long time ago announced (and now almost evident) Moore's Law crash & rising process variability in nanoscale digital electronics (A.K.A. unpredictable logic delays). The problem is that the learning curve and the lack of specific EDA tools, make asynchronous logic a very hard to learn discipline.
In order to empower the widespread adoption of asynchronous logic design, I´ve started an open source project in which I'm transferring for public use all the knowledge generated by the Asyncart research & (extinct) company. This project is being hosted in the Open Hardware Repository (supported by CERN) and you can take a look to its contents in the next links:
* Official web page: www.asyncart.com
* Project Repository: www.ohwr.org/projects/asyncart
Nowadays, I work full-time as an embedded system engineer in a private company (and I recently fathered my first daughter), so I cannot dedicate the time I would like to the AsyncArt project. For this reason, the AsyncArt project need to reach some level of visibility in order to build-up an autonomous open source community.
If you believe that the project is worthy enough, your help can be a big impulse. In order to bring the AsyncArt initiative to the maximum of potential qualified users and collaborators, a press-note in some of the publications you write at (or even a blog entry) can really make a difference. I'm following you in the media for a long time, and I know that I'm only one in a huge community made of "Max the Magnificent" addicted (& technically qualified!!) readers.
Thank you very much for your time. Best wishes, Javi
Personally, I have a deep interest in the concept of asynchronous design, so I intend to look into this in more depth. In the meantime, I can certainly help to spread the word … hence this column (grin)!
If you found this article to be of interest, visit Programmable Logic Designline where – in addition to my Max's Cool Beans blogs – you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
Navigate to related information


helgerud
1/5/2013 10:27 AM EST
Hi
Interesting article.
I have designed a few self-timed logic circuits, and as long as the loop gains are sufficient they work fine. As far as I know, it is the only way to completely eliminate the probability for metastate.
However, I never made self-timed logic in an FPGA, because the LUT is implemented using a small RAM. This could generate unpredictable spikes on the output when more than one address bit changes concurrently. How do you avoid this?
Rgds.
Per Helgerud
Sign in to Reply
Garcia-Lasheras
1/5/2013 3:40 PM EST
Hi Per,
this is a very good question.
When working with fully asynchronous design (delay insensitive approach), latches/keepers are used instead of conventional clocked Flip-Flops.
In this kind of circuit, dimensioning loop gains by controlling CMOS transistor parameters is critical to minimize the probability of reaching a metastable state. When working over COTS devices, the transitor customization option simply dissapears so, as you note, there are limitations to the set of asynchronous methodologies that can be implemented in an optimal (and secure!!) way.
As stated in the article, the design methodology used in the AsyncArt project is mostly inspired in the Sutherland's micropipeline. This kind of circuits relies in the bundled-data approach, in which the datapath is implemented with conventional digital logic and only the data flow control is constructed with delay insensitive asynchronous logic.
There are plenty of Flip-Flop resources in any FPGA, so these pieces of logic are used intensively in our designs not only for storing datapath values, but even for keeping the asynchronous dataflow control states too.
By this way, LUT based asynchronous logic is in charge of generating perfectly coordinated clock shots (or bursts) that feed the clock input of different Flip-Flop domains when the associated datapath segment need to perform any task.
In order to verify the correct behaviour of these FF + LUT based design approach, intensive stress tests have been conducted in several FPGA devices. In these tests, the devices were left running at maximum speed for more than a week and no failure was detected.
It's interesting to note that not only RAM LUT based devices have been tested (Xilinx's Spartan/Virtex & Altera's Cyclone): FLASH LUT based devices performed correctly too (Microsemi ‘s -formerly Actel- ProAsic/Fusion).
Best regards,
Javi
Sign in to Reply