United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Moving Data across Asynchronous Clock Boundaries

Reduce data validity and timing problems without reducing data rates through careful design at the interfaces.

By Peter Alfke


Digital designers prefer synchronous single-clock systems that are robust, easy to design, and easy to simulate. However, cases exist where multiple clocks with unrelated frequencies are unavoidable. Telecom and datacom are full of situations where data from one clock domain must be handed over to another clock domain. Such designs, though requiring special care and a thorough understanding of timing issues, can ultimately yield a reliable product.

With two incoherent clocks, however, it isn't a question of whether the worst-case timing relationship might possibly occur. The situation is guaranteed to occur — repetitively. Clearly, the ability to move data, problem free, across asynchronous boundaries becomes crucial in these cases. Using proven design techniques reduces problematic arbitration circuits to a minimum. In addition, the circuits designed will operate reliably.

The rule of law

Although designers favor synchronous circuits, the existence of multiple clocks in a system requires the observation of several key rules to ensure successful designs across the clock boundaries. One such rule: never synchronize an asynchronous input in more than one parallel flip-flop "simultaneously". As asynchronous input frequently changes during the set-up-time window of the synchronizing flip-flops — and given that no two flip-flops are exactly alike (even when residing on the same chip) — sooner or later a clock edge will come along where one flip-flop interprets the input as a 1, while the other flip-flop interprets it as a 0. This abnormal situation spells trouble and the design must be changed to use a single synchronizing flip-flop.

Figure 1 - Parallel Passing
With a flag and a handshake, parallel data can successfully pass across clock boundaries.
However, even a single synchronizing flip-flop can exhibit strange behavior. There is a very small but finite probability that the asynchronous input change will occur in an extremely narrow, sub-picosecond part of the set-up-time window when the master latch in the flip-flop receives just enough energy to go into the exactly balanced state. The flip-flop thus becomes metastable.

Modern flip-flops rarely go metastable, but when they do, they usually recover fairly quickly. The smart designer can work around this potential problem by allowing for a few extra nanoseconds of flip-flop settling time. It really doesn't matter which way the flip-flop settles since the incoming data contained ambivalent timing. The problem with metastability isn't the digital uncertainty of the resulting output, but rather the timing uncertainty of the output change.

Holding hands

Other conditions can pose design challenges as well. For instance, what strategy best addresses a situation where parallel data must pass across a clock domain boundary? The traditional method is to generate a flag and to use a handshake sequence (see Figure 1).

When the transmitter has parallel data ready for transfer, it creates a rising edge on the READY line, which in turn sets flag F telling the receiver that data is available. The receiver scans F continuously and, after finding it high, accepts the stable parallel data, and then creates a rising edge of ACK which sets flip-flop A. This resets F, which in turn resets A. This particular design makes no assumptions about any phase or frequency relationship between the transmit and receive clocks. Such generality dictates a design using a benign controlled race condition between the two flip-flops. A reasonable loop delay can conveniently be inserted between F and the reset of A. In a less generic design, this delay might be implemented as one period of either the transmit or receive clock.

Metastability
Whenever a clocked flip-flop or latch synchronizes an asynchronous input, a small probability exists that the output will exhibit an unpredictable delay. This delay occurs when the asynchronous transition not only violates the set-up time specification, but actually occurs within the tiny timing window where the master latch is being disabled and, therefore, locks up the input data. Under these circumstances, the flip-flop can enter a symmetrically balanced transitory metastable state. While the slightest deviation from perfect balance will cause the output to revert to one of its stable states, the delay in doing so depends not only on the gain-bandwidth product of the master-latch feedback loop, but also on the original perfection of the balance and the noise level in the circuit. The delay can, therefore, only be described in statistical terms.

A failure occurs when the additional delay exceeds the available slack in the connection to the next flip-flop input synchronized by the same clock.

The mean time between failure (MTBF) is inversely proportional to the frequency of asynchronous input changes, to the clock frequency, and to a constant K1 that describes the width of the metastability catching window.

MTBF is an exponential function of the acceptable extra delay t multiplied by a constant K2 that describes the gain-bandwidth product of the master latch. In recent years, K2 has increased significantly with IC process improvementsıfrom 2.6/ns for the 1990 1.5-ım technology, to 19.4/ns for the 1995 0.5-ım micron technology. More recent designs are still being evaluated. Everything else being equal, this difference in the exponent has increased MTBF by a factor of e(19.4 - 2.6) = e16.6 (about 20 million).

Or, for the same MTBF, the needed extra time t has decreased by a factor of seven. Metastability can still be a problem for very high-speed asynchronous interfaces, but it has lost its impact on systems that can afford a few extra nanoseconds of settling delay.

This traditional handshake requires both sides to poll the flag F. The transmitter must change parallel data only when F is low, and the receiver must accept data only when F is high. This requirement results in a safe but slow data transfer. However, speedier ways to transfer data across an asynchronous clock boundary exist.

If the receiving clock is much faster than the incoming data changes, then it's sufficient to double-buffer the asynchronous word with the receiving clock and to check for identity of data in the two registers (see Figure 2).

If identical, no asynchronous change appeared either before or during the receive clock period and both registers contain the same valid data. The identity comparator can also be used as a transition detector — it will go inactive whenever the asynchronous data made a change.

If the asynchronous data is a binary counter, then this double-buffered circuit can even cope with counter changes that are as fast as the reading-clock period. Modify the identity comparator to accept not only identity, but also a difference of +1. Like the circuit described before, this circuit rejects the erratic code that might be captured during the counter transition, but allows the reading circuit to fall behind by one counter clock period. Alternatively, changing the comparator window can increase this tolerance.

Feeding FIFO

When the receiving clock has to read asynchronous data that might occasionally change faster than the read clock period, then an asynchronous first-in-first-out (FIFO) memory must be inserted as an elastic buffer. Such a FIFO consists of a dual-port RAM with independent write- and read-address counters and data ports. Dual-port RAMs and even complete FIFOs are readily available as dedicated ICs or as components inside an FPGA (for example, Xilinx's Virtex). FPGA dual-port RAMs vary from 16 bits deep — implemented in look-up-table logic — to 256 and up to 4096 deep — implemented in on-chip BlockRAM. Inputs and outputs can be clocked well above 100 MHz.

A true dual-port memory allows independent operation of each port. The write side uses a continuously running write clock and writes data by activating WRITE ENABLE. The read side uses a continuously running read clock and reads data by activating READ ENABLE. In order to avoid decoding glitches, it's advisable to utilize Grey-coded addressing for both ports. In a FIFO, the addressing code sequence is irrelevant, provided both ports use the same sequence. A Grey code, where only one bit changes on any particular transition, is ideal for crossing the clock domain boundary.

Figure 2 - Double Duty
Double-buffer the synchronous signal with the receiving clock if that clock outpaces the incoming data.
Running on empty

In a true dual-port memory, each of the two ports operates synchronously in its own clock domain. The two domains need to communicate with each other only in the extreme cases of FULL and EMPTY; only these two flags require special attention. More precisely, it's only the trailing edge of each of these signals that proves difficult to control because the leading edge is a synchronous signal.

FULL goes active as a result of a write operation. This leading edge is thus synchronous with the write port, the only port that utilizes this flag. EMPTY goes active as a result of a read operation. This leading edge is thus synchronous with the read port — the only port that utilizes this flag. Only the trailing edges of these two flags must bridge across the clock domains. Luckily, even a fast system can tolerate some extra synchronization delay on the trailing edges of FULL and EMPTY, which only slows down the restart of operations after an extreme situation.

Normally, the two extreme situations of FULL and EMPTY are indicated by the same condition: equality of write and read addresses. A simple way to distinguish between the two uses a latch that is set or reset by comparing the two most significant address bits of both counters. Visualizing the address count sequencing as circular, the two most-significant bits (MSBs) in binary as well as in Grey code designate the address quadrant of each counter. The four bits are decoded in two look-up tables to determine the quadrant distance between the two counters from the 16 different combinations of the two MSBs of both counters:

  • Four codes describe the situation where the write counter is in the quadrant immediately behind the read counter. This is decoded as "possibly going FULL" and it sets the DIRECTION latch.
  • Another four codes describe the situation where the write counter is in the quadrant immediately ahead of the read counter. This is decoded as "possibly going EMPTY", and it resets the DIRECTION latch.
  • Four other codes indicate that read and write are in the same quadrant and another four codes show them in opposite quadrants. These eight codes provide no useful information to the DIRECTION latch, and are therefore ignored.

Who's on first?

Figure 3 - Full time work
Stretching the FULL signal to avioud inactivity during write-clock low time prevents the trailing edge of FULL to occur during write clock set-up.
The DIRECTION latch is thus established well before the actual FULL or EMPTY condition can occur. The output of the DIRECTION latch is thus available to convert the address identity comparator output into either the FULL or EMPTY output signal. As mentioned before, the leading edge of these signals is inherently synchronous with the clock domain that utilizes that edge. The trailing edge of FULL, which is initiated by the read clock, must be prevented from happening during the set-up time of the write clock. The easiest method is to stretch the FULL signal such that it can't go inactive during the low time of the write clock, assuming a rising-edge clock (see Figure 3).

The possibility remains for metastable confusion, if the FULL condition ends during the extremely small timing window when the latch is about to latch up — right after the falling edge of the write clock. In most cases, the metastable output will have settled well before the next rising edge of write clock. If the user is still concerned about this low-probability risk, FULL can be stretched by a complete write clock cycle, which reduces the likelihood of metastable error effectively to zero.

An alternate design incorporating a flip-flop generates a trailing edge of FULL synchronous with the write clock (see Figure 3). The EMPTY signal is of course symmetrical and must be stretched or synchronized the equivalent way.

These designs assume free running read and write clocks, activated by their respective enable signals. In the absence of a free running read clock, the design would lock up with an active EMPTY-STRETCHED output, which must then be terminated by a high level on read clock. If this clock isn't free running, the EMPTY-STRETCHED output stops the external decision-making logic from making read clock go high. Thus EMPTY-STRETCHED stays active, even after data has been written into the FIFO; FULL-STRETCHED would behave similarly without a free-running write clock. Free-running clocks, activated by their respective enable signal, avoid these problems (see Figure 4).

Small FIFOs can be implemented in the 16-bit SelectRAMs, but for deeper FIFOs, the Virtex BlockRAMs provide a much more efficient alternative. A typical 256-deep, n-times 16-bit wide FIFO needs only n+1 BlockRAMs plus three logic blocks (CLBs). The n BlockRAMs operate as dual-ported 256 x 16 RAMs with independent write and read ports, each with its own clock and clock enable signal. The additional BlockRAM is used as a dual-ported ROM look-up table for the sequence of Grey-coded addresses. It thus operates as both a write and read address counter with registered outputs that address the data BlockRAMs directly. Generating the Grey-coded addresses in ROM is faster and simpler than doing it in conventional logic.

Figure 4 - Cool running
An efficient implementation uses the BlockRAMs in the Virtex architecture.
Pipeline strategy

All that is left then is to implement the DIRECTION latch, the address identity comparator, and the stretch circuits mentioned above. This can be done in three CLBs. The FIFO can operate with both asynchronous ports clocked at well above 100 MHz. Since each side of the dual-port ROM used as an address counter has another eight outputs available, these can be used as look-ahead addresses — making it possible to decode FULL and EMPTY one clock period in advance and to pipeline them. This strategy allows for operation close to the inherent 200 MHz BlockRAM cycle frequency.

We have presented various methods to transfer data across an asynchronous clock boundary. Modern ICs contain abundant flip-flops and can afford to double-buffer the data or to implement FIFOs, where the receiver rarely if ever interrupts the transmitter. Seemingly problematic arbitration circuits can be reduced to a minimum, and can be designed to operate reliably.


Peter Alfke is director of applications engineering at Xilinx, as well as Distinguished Engineer. Previously he was with AMD, Zilog, Fairchild, Litton Industries, and LM Ericsson. While at Fairchild he initiated the design of a set of calculator building blocks, the origin of the HP35 pocket calculator. In 1970, Alfke invented the first FIFO integrated circuit, the Fairchild 3341.

To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to mikem@isdmag.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About