United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Designing an ATM Chip

Using embedded array technology, engineers at Fujitsu Microlectronics were able to enhance the design of an existing ATM chip.

by Rajesh Varshani and Barry Marsh


With asynchronous transfer mode (ATM) standards and technology evolving at exceptional rates, designers at Fujitsu Microelectronics (Manchester, England) wanted to enhance an existing ATM design and implement it by the fastest means possible. Embedded arrays have long offered flexibility coupled with fast turnaround, but could they be used for the latest version of a complex ATM switching device?

Among the many critical issues involved in this choice were speed, device complexity, power consumption, and the ability to transfer previously developed intellectual property into the embedded-array technology. Critical implementation issues included the need to floorplan the device's layout so that pinouts support easy matrix expansion of the ATM switch architecture.

The ATM switching device in this project is the most recent version of Fujitsu's Self-Routing Switch Element IC, the SRE-L. The company chose the 0.5-µm embedded-array technology of its ASIC group to implement the device. The key reasons for choosing this technology included a significant reduction in power consumption and higher density, thus leading to a more cost-effective implementation. Fujitsu had implemented the original SRE as a 0.8-µm standard-cell device.

Deciding on an embedded array Once the business decision was made that it would be extremely valuable to get the SRE-L to market sooner than a standard-cell implementation would allow, Fujitsu designers had several technical issues to evaluate before committing to an embedded-array approach. One issue was speed, but the device does not have to operate as fast as its 155-Mbps speed might suggest. Although ATM utilizes a serial data flow on the network side, inside the switching hub all data paths are 8-bit parallel, permitting the SRE-L to run fast enough to deliver a maximum bandwidth of 300 Mbps per port.

More of a concern was the complexity of the SRE-L. The fact that the design has clock trees fanning out to approximately 3,600 flip-flops gives an indication of the design's complexity. Just as crucial were the blocks of RAM needed for buffering. Fortunately, Fujitsu embedded arrays were available that could supply both the necessary gate counts and the embedded RAM capacity.

The design's complexity raised questions about power consumption--probably the most critical aspect of the implementation decision. Customer requirements in conjunction with packaging options establish the device's maximum allowed power consumption. The resulting power specification made it imperative to use a 3.3-V core, which the CE51 technology provides.

Designers generally think of deep submicron technology as a way to boost a device's speed. In the case of the SRE-L, however, the ability of small geometries to reduce power consumption proved more valuable. The complexity and power constraints narrowed the designers' consideration to Fujitsu's 0.5-µm CE51 embedded array family. At 1.2 µW per MHz and as many as 753,768 available gates, these devices could stay within the power budget while delivering the speed, gate count, and RAM capacity the design required.

A further consideration was whether the intellectual property in the original SRE could be reconfigured for implementation in the CE51 family's on-chip resources. Since the original SRE was designed in HDL format, it was a simple task to retarget the design to CE51 using logic synthesis.

Figure 1. The main constraint on the SRE-L floorplan was the need to include large embedded RAM blocks for use as buffer memory. Accommodating both the RAM and the matrix-expansion I/Os led to the simplified floorplan shown here.
Floorplanning around embedded RAM The design process for the SRE-L had to overcome several hurdles. The first step, floorplanning, had to be done carefully because of the four embedded single-port RAMs used to implement the SRE-L cell buffers.

One ATM cell consists of 53 bytes, or seven words. The RAM size of 1,024 words by 64 bits allows a capacity of 146 cells. The RAMs' size, aspect ratio, and pin positions dictate the device's floorplan. The floorplan was also affected by the need for a pin assignment that permits simple interconnect of several SREs in matrix fashion on a PCB. Figure 1 shows the resulting floorplan. As the floorplan shows, the SRE-L includes boundary scan-test circuitry compliant to IEEE1149.1 (JTAG).

Embedded designs such as this one inherently contain some element of risk because the RAM positions are fixed before the final netlist is released from the designer. Typically, a customized master-slice is designed and then held in a pre-metalized condition. In parallel with wafer fabrication, designers release the final netlist, which is laid out. When the final verification process completes successfully, the layout data set is sent to the factory where the final masks can be etched on the array. The ASIC engineer, together with the designer, must position the RAM with absolute certainty; otherwise, there is a risk of having unusable master-slice wafers.

Figure 2. The clock-tree structure provided by the CE51 embedded-array technology simplifies clock insertion for a design as complex as the SRE-L. Two global drivers were used to handle one clock tree that had a total of 2,300 flip-flops attached.

Clock insertion As mentioned earlier, the SRE-L design includes approximately 3,600 flip-flops. These 2,300 flip-flops connect to a single clock tree. The construction of the clock trees was therefore crucial for optimum device performance.

The tool environment that supports the CE51 embedded array family includes a program that inserts clock trees into a design's gate-level netlist. The technology makes use of a two-layer clock-tree structure, comprising global and local clock trees. These trees are driven by global and local buffers, respectively (see Figure 2). The flip-flops driven by a local buffer are referred to as a cluster.

The largest local buffer can drive a cluster of as many as 160 flip-flops, and a global buffer can drive 10 locals. It follows that a global buffer drives a maximum of 1,600 flip-flops. Thus two global buffers are driven at the clock start point to handle the 2,300 commonly clocked flip-flops in the SRE-L. The clock insertion tool requires the designer to either specify all the clock start points in the netlist or insert a special dummy cell into the netlist to notify the tool of the clock start point. The designer specified the clock start points in this design. When creating the clock tree, the insertion tool has the ability to delete any inverters or true buffers that the synthesis tool inserts.

Upon execution of the program for the 2,300-flip-flop clock tree, the tool generated the clusters based upon the netlist's logical hierarchy. Because this design's logical hierarchy was not identical to the physical hierarchy, as is often the case, designers changed the clustering as needed. The adjustments to the clustering were then fed back into the insertion tool so it could generate the final clock trees and produce the clock-inserted netlist.

Fujitsu's layout tool is designed to guarantee local buffer clock skew of less than 150 ps. To help verify timing, designers obtained an estimated standard delay format (SDF) file to verify the clock trees and perform pre-layout simulation and timing verification. When the errors found in these simulations were corrected in the netlist, the design went into layout.

The ATM switch
In examining how the SRE-L was designed, it is useful to consider the function of the device, its characteristics, and the nature of the upgrades from the previous version. The SRE-L is referred to as "self routing" because it implements routing functions based on a 3-byte routing tag appended to each 53-byte ATM cell.

One of the challenges of designing an ATM switch is that a stream of ATM cells can include general-purpose data files, sampled sound, and real-time video signals. The timing needs of the cells containing the general-purpose data are not critical. However, the voice and video information represents isochronous (timing sensitive) data that the ATM network must deliver in a timely manner.

Another consideration is the "shape" of the data coming from sources on the ATM network. The asynchronous nature of ATM means that one network node effectively sees the network's entire bandwidth, in contrast to a mechanism such as Ethernet, in which nodes contend with each other for bandwidth.

Any workstation on an ATM network could theoretically transmit huge files at 155 Mbps, and the ATM switch would be able to handle the stream--unless another source attached to that switch simultaneously transmitted a large file at a high rate to the same destination. If this simultaneous transmission occurs, the danger arises of overwhelming the switch's buffers, and cells would have to be discarded, leaving the sending and receiving parties to recover from the loss. If the lost cells were part of real-time audio or video data streams, the parties involved could not simply retransmit the data.

Experience with first-generation ATM switches has led to improved flow-control methods, especially for time-critical cells. The SRE-L includes these and several other enhancements over Fujitsu's original SRE, introduced in mid-1993, yet most of the original functionality has been retained in the new version.

Both SRE versions implement 155-Mbps switch fabrics in a 4-by-4-cell structure. That is, each chip provides the functionality to switch four ATM inputs to four outputs. To enable this switch to handle more inputs and outputs, each switch element can provide a cascade queue for column interconnects in a simple matrix arrangement (see figure). In this configuration, cells are available to all the switches in a row by being regenerated at the row expansion outputs--a requirement that was to make the floorplan for the SRE-L difficult.

Another challenge to floorplanning was the need for large output buffers in the SRE-L. These buffers help limit the effects of output contention, which occurs when two or more cells need to be routed to the same output simultaneously. To avoid negative effects from this contention, the switch must incorporate buffers for regulating the traffic.

The larger the output buffers, the better the chip can handle output contention. Each output buffer in the SRE-L can hold as many as 146 cells, compared to 75 cells in the original SRE. Users can divide the buffer capacity into a 121-cell low-priority queue and a 25-cell high-priority queue. This arrangement helps ensure that the network delivers isochronous data within a maximum time window.

Other functions added to the SRE-L include selective cell discard based on a cell's CLP bit. The CLP bit determines whether the switch can discard a cell when the output buffer fill-level exceeds a selectable threshold. A certical serial flow control (VSFC) feature provides flow control information for each output buffer, enabling cell-by-cell flow control. The switch also provides a selectable per-virtual-channel (VC) explicit forward congestion indication (EFCI) function. These and other new functions were relatively easy to add to the existing SRE design, compared to the difficult task of floorplanning a matrix-type architecture around the blocks of RAM that supplied the storage capacity for the chip's buffers.

ATM multi-chip switch architecture

The embedded array layout The first step of validation is to expand the netlist and create a layout database. The resulting array is brought up onto the workstation and layout begins. In the SRE-L layout, an engineer placed the embedded RAMs first, then placed the power grid around the RAM.

The global and local clock buffers were manually placed in optimal positions to minimize skew across the global network. An engineer did the global-to-local clock routing, followed by tentative placement of the flip-flops on flip-flop bars around the local clock buffers. The bars were placed within a 2-mm distance of the local buffer, which guarantees the local clock skew of 150 ps. This low level of skew in turn guarantees "shift-register" operation between flip-flops without any timing problems.

With the main elements of the design in place, an engineer ran a cell placement program based on a simulated-annealing algorithm. This program placed the glue logic as well as performed the final flip-flop placement. With the flip-flops positioned, the local-buffer-to-flip-flop routing could be done.

At this point, engineers had the layout tool generate a Manhattan-based SDF file with accurate clock-network timing. This intermediate SDF file generally proves to be extremely useful, as it is accurate to within 5 percent of the final routed values and allows designers to commence evaluation without waiting for the full chip routing. In the case of the SRE-L, timing problems were found and corrected. In the meantime, an engineer ran the full routing of the device and generated post-layout SDF information.

Static timing verification The timing verifier program in the Fujitsu tool environment was used to check for timing problems in the SRE-L. The timing verifier allows users to specify the input stimulus, process variation, and delay-magnification values--multiplying factors to account for process, temperature and voltage-supply variations. The tool also allows users to hold specific parts of the design static. This capability is essential in designs such as the SRE-L because it allows designers to suppress any timing violations that might occur due to signal combinations that are impossible in actual operation (false paths).

The static timing verification process requires at least four runs to find all errors: setup error detection at min and max delay magnification (DMAG) and hold-time error detection at min and max. For the SRE-L, each of these four main runs was further subdivided into runs that eliminated false paths, for a total of 16 separate verification runs. This approach found all of the potential timing problems in the SRE-L.

With the timing errors corrected, the design was released to fabrication. A working device was obtained on the first pass. By using an ASIC methodology and reusing existing ATM intellectual property, Fujitsu quickly delivered an enhanced, cost-effective solution to market. This design serves as an example of an ASIC methodology fulfilling what has traditionally been considered an standard cell (ASSP) application.

Rajesh Varshani is an ASIC design engineer at Fujitsu Microelectronics Ltd. (Manchester, UK).

Barry Marsh is the director of communications products at Fujitsu Microelectronics Inc. (San Jose, CA).

To voice an opinion on this or any Integrated System Design article, please e-mail your message to michael@asic.com.


integrated system design  August 1996



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail marcello@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 1996 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About