ASIC designs are presenting design challenges that are increasing the stress on engineering design teams. Large ASICs now suffer from problems that are either new, or that used to be easy to overcome. For example, power "droop" across the chip, radiated electromagnetic interference and noise problems are caused by large amounts of simultaneously switching transistors. Timing-closure issues are also a concern. Functional design is tending to be a very small component of the overall design cycle, with clock insertion, balancing, skew management and the like taking up larger and larger portions of the design cycle. Contributing to the timing-closure problem is uncertain timing. With the increasing inaccuracies present in today's modeling environments-if transistors are no longer characterized by statistical behaviors, but are widely varying due to small differences in their doping-simulation approaches become crude estimates of the expected behavior. This forces designers to choose designs that are overly conservative in order to guarantee functionality.
Another area of concern is designs that involve multiple clock domains or clock domain crossing. Today, more designs require the cautious designer to deal with situations where fully synchronous design practices are no longer feasible. The ratios or combination of required operation rates for pieces of the chip do not easily scale to a common master clock reference rate. Instead, designers are forced to try to provide bridges and FIFOs between the different clock domain regions. Meanwhile, with design complexities increasing, it is becoming vital for design teams to reuse as many components as possible. When multiple pieces of intellectual property are combined, designers are forced into a spiral of compromises to merge conflicting clock rates into one design.
The root cause of these problems is the design's inherent requirement for a clock signal to pace the execution. Power droop occurs when all of the registers in the circuit are suddenly simultaneously activated by a clock edge, causing a major power transient. Timing-closure issues all revolve around getting the real-world clock signals to behave as if they were operating in a mythical, ideal case. Uncertain timing is a problem of the clock once again: The clock edge has to arrive at the right gates at the right time, or the function behaves incorrectly.
So, why not consider eliminating the clock? Clockless system design uses the same kinds of constructs as clocked systems: There are components that are used to transform the data, and there are components used as registers to store intermediate results, and there are elements that can be used to synchronize the transfer of information from one register to the next. The way that these functions are accomplished distinguishes the different types of clockless design approaches and provides different benefits and challenges.
Delay line matching
The clockless logic most familiar to clocked designers is probably "bundled data." This approach retains the same combinational logic as a clocked design uses. The clock is replaced with a delay line matched to the combinational logic; the delay line is used to latch the results into the next register. This approach is an efficient design method suitable for high-speed, low-area circuit design. The Amulet, a clockless 32-bit ARM processor from the University of Manchester, England, is an example of a design implemented using this approach.
The main challenges of using this approach involve the matching of the delay lines: The quality of the resulting design (measured in speed) depends on the ability to achieve very close timing between the delay line and the circuits it is modeling. This forces the designer to work closely with the physical implementation to achieve the required timing closure-in fact, this is potentially a worse timing-closure problem than high-speed clocked design. If one is too aggressive with a delay line, the result could be a chip that will not function, and there is no clock to slow down the circuit to make it operate. The only way to fix the circuit is to respin the design with a more conservative delay line. If done correctly, the approach provides several benefits shared with other clockless approaches, including:
Bad experiences with bundled-data- style approaches form the basis for the legends of clockless design being extremely hard, painful to get right and subject to failure over temperature, process or voltage variations.
The other major style of clockless design is quasi-delay-insensitive approaches. QDI is used by Theseus Logic, Fulcrum Microsystems Inc. (Calabasas Hills, Calif.) and Caltech. These approaches do not rely upon critical timing (e.g., the clock, or a matched delay line) to get a correct result. Instead, they use a more structured approach to provide synchronization between the registers. The circuits themselves "decide" when they are done calculating, and when they are ready for new information.
Each register interacts and communicates with its immediate upstream and downstream neighbors, and they "negotiate" an operating rate that is as fast as the delays in the system will allow. This "request/acknowledge" handshaking used by the registers allows the circuits to be designed for functionality without any regard to timing (this is why the circuits are "delay-insensitive"-the functionality does not depend on time).
Two drawbacks are most commonly cited with QDI-style approaches. The first drawback that is claimed is that the area costs are higher due to the more structured communications protocol. This is real: QDI circuits tend to be somewhat larger than straightforward, regular clocked designs. However, area savings can be realized by taking advantage of the clockless design capability and choosing new architectures made possible by the removal of the clock. In general, though, simple designs will be larger than their clocked cousins.
A second drawback to QDI often cited is that the signaling protocols appear to consume more power (e.g., each circuit cycles through a "refresh/NULL" cycle to a "calculation/DATA" cycle). In reality, the QDI approaches result in power savings greater than or equal to that achievable by fine-grained clock gating. Circuits that are not needed for the current calculation are quiescent, without complicated clock-gating efforts. This generally more than overcomes the apparent power penalty of the signaling protocol.
In return for these challenges, the designer is left with a circuit design that is delay-insensitive, power-aware and reusable. With no assumptions about timing required for functionality, the designer's functionality is isolated from the physical implementation.
More secure smart cards
Also, clockless behavior means the system is not subject to simultaneous switching throughout the entire design, easing the power grid design on the chip. This also reduces the noise/EMI problems, and can provide enhanced security for smart-card (or similar) applications. In addition, delay-insensitivity means that there are no timing-closure problems-clock tree balancing and skew problems are eliminated. Individual portions of the design can be worked upon to achieve increased speed, without worrying that the changes will break functionality on a global basis.
A consideration at smaller geometries is that uncertain timing/modeling for the 0.13-micron (and smaller) processes is no longer an issue that may break functionality. If the timing models are inaccurate, speed will be affected, but not function. Because QDI systems easily and naturally operate at independent frequencies, synchronization is only necessary on portions of the design as they communicate with one another.
QDI systems can also be easily used to bridge different synchronous clock domains to one another, the so-called Gals (globally asynchronous, locally synchronous) systems. Design reuse or improvement of clockless systems is trivial. The rate-independent nature of the designs means that blocks can be incrementally replaced or upgraded, and the surrounding blocks will automatically synchronize with the new element. There is no clock tree to resynthesize, no adjustment required to the block to be able to operate at the system's master frequency, etc.
Clockless design approaches aren't as difficult to use as they once were. While not supported as seamlessly as traditional synchronous systems, there are emerging sets of design tools supporting clockless designers. Although many approaches involve new languages or programming styles, not all are unfamiliar to clocked designers.
The clockless design approach we use at Theseus is called Null Convention Logic and is based on VHDL for RTL design entry and simulation.
Designers write code similar to what they are familiar with in clocked designs; a commercial synthesis tool, such as Design Compiler from Synopsys Inc. (Mountain View, Calif.) or Merlin from FTL Systems Inc. (Rochester, Minn.) is used to synthesize the gate-level circuits, which can then be placed and routed. Some of the current limitations associated with this approach are being addressed through enhancements to the VHDL language and are being considered for adoption as changes by the IEEE VHDL steering committee, as part of the VHDL-2004 specification process.
The same approach has been used by Honeywell International (Morristown, N.J.) and by Medtronic Inc. (Minneapolis) in embedded medical applications. Theseus designers have used Null Convention Logic to develop more than 20 first-pass ASICs, including the HCL08GP32 microcontroller, a clockless, low-power version of Motorola Inc.'s forthcoming 8-bit HCS08 microcontroller core.
Ryan Jorgenson is vice president of engineering at Theseus Logic Inc. (Orlando, Fla.).