United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



ASIC Technology

Designing an ATM SAR Controller

A unique design method achieved a 400K gate ATM SAR using customizable standard blocks.

by Alan Gibbons


The SAR is a highly integrated system-on-a-chip that implements the ATM and AAL layers of the Asynchronous Transfer Mode (ATM) User-Network Interface (UNI). In an ATM network, it performs all segmentation and reassembly functions and a majority of the Convergence Sublayer (CS) tasks for multiple concurrent frames covering the AAL1, AAL3/4, and AAL5 protocol layers.

The design complexity of the SAR places extreme demands on both the hardware design and verification flow as well as on the design tools used. Verification of the hardware/software interface provides additional requirements that must be satisfied to provide a complete design solution.

Comprising the SAR are an on-board RISC processor, a PCI bus interface unit to a host system, a DMA controller, a flexible memory controller, Link I/O controller, and SONET/SDH Framer. The on-chip RISC processor makes SAR operation intuitive and highly flexible. The Framer supports the STS-1 and STS-3c framing standards with either a serial or a parallel interface to external Physical Media Device (PMD) transceivers.

The ATM SAR Controller is at the heart of a network adapter card for a PCI based host system, integrating all the interface logic needed to yield a compact and low-cost solution. The minimum additional devices needed for a complete system are DRAM or SRAM for VCC descriptors and buffer storage, and physical layer transceiver/clock recovery devices.

A System Test and Evaluation board (STEB) provided a testbed to evaluate the ATM SAR in a Network Adapter Card environment. A Hewlett-Packard ATM Broadband Series Test System, including a 155 Mbit/second Optical Line Interface and Cell Protocol Processor, drove the STEB. The Test System integrates physical layer testing with higher layer services protocol testing.

The design team for the SAR was located in two primary locations. The team created a methodology to define how data was maintained and shared between the groups. The team established directory structures, environments, tool versions, and an automated flow that updated the two distributed databases regularly. The SAR's design environment ran on Hewlett Packard HP735 UNIX workstations to give the performance needed for the design's size and complexity.

Technology independence was a major requirement in the controller development, because the design had to be portable to customers. Thus, the SAR is described entirely in Verilog HDL. All structured technology dependent cells were enclosed in wrappers (typically, memory elements­RAM & ROM).

Since interfacing multiple CAE tools had previously proven a painful process, the team integrated few tools in the design flow. Chronologic's VCS drove Simulation, Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys ' Design Compiler performed synthesis, Quad Design's MOTIVE did Timing Analysis, and Cascade's EPOCH Design Planner performed physical design, including memory compilation. A Verilog netlist served as interface between these tools.

The configuration management (CM) system directed multiple designers working on various parts of the chip with a single RCS directory as a source. Each designer had to have the latest source data versions and had to be able to extract a revision history of a design element at any time. In many cases, different designers performing "what-if" analysis needed to modify the same code in different ways.

A central source database was selected to be the CM system, and it proved to be an efficient, readily available solution. A central source database (RCS directory) was created to serve all designers and hold the master versions of all source data elements.

Each designer within his own working directory linked to the central RCS directory. Makefiles were created for simulation, synthesis, and physical design steps; all design work was executed from within Make. Makefiles were linked in a hierarchical fashion for dependency tracking through the entire design flow. The final layout database had a dependency link back to the RTL code.


Figure 1. Major elements of the ATM SAR include a RISC CPU core, PCI bus interface, SONET communications interface, and buffer memory to hold ATM packets.

The complexity of the SAR demanded an intelligent partitioning of the major functional modules early in the design process. The pinout was established and a high-level floorplan of the design created and maintained through layout. First pass partitioning focused on isolation of the modules in the SAR. These modules have well defined functionality and represent the complex functional blocks (CFBs) that constitute the SAR.

By virtue of solid functional partitioning, layout can be performed on a block basis and then integrated at the top level. There is no need to perform a full-chip layout in one pass. The partitioning of the SAR into modules was required for the following reasons:

  • Fixed block methodology through layout.
  • Individual designers assigned to each block.
  • Blocks designed and tested in parallel.
  • Minimize coupling between blocks through architecture of bus structure and clean block interfaces.
  • Incremental change support.
  • Reuse of CFBs.

Fixed-block methodology The team defined a fixed-block methodology to design each module separately. This required identical logical and physical partitioning, and all self-contained blocks supported full verification and synthesis independent of other modules. Each module was simulated, synthesized; then laid-out and maintained as separate databases.

A module was "frozen" after performing layout and timing verification with post-layout parasitics. It was not revisited unless major changes in other modules impacted the interface of the "frozen" module. After all modules were frozen, they were integrated for final chip-level verification, bus routing, clock trunk generation, and ATPG.

With this flow, design hierarchy was maintained at all points in the process. By partitioning the SAR into functional modules, the design team streamlined the flow and designed in parallel, resulting in a faster cycle time.

In addition to functional partitioning, logic within the CFBs were partitioned for synthesis. The team implemented a bottom-up synthesis approach in which size, complexity, and functionality of each sub-block was considered. Each sub-block was synthesized separately with well defined boundary conditions. Optimal compile strategies were then implemented based on these factors. The team made minor deviations from this strategy as design issues demanded. The following concepts were adhered to as far as possible:

  • Early identification of the critical path and isolation of this path from non-critical logic.
  • Isolation of datapath elements from general glue logic.
  • Compiled modules were limited to < 7000 gates where possible, but critical path control and isolation was a higher priority.
What's a customizable standard product?
The Motorola Customizable Standard Product (CSP) concept allows a designer to integrate User Specific Logic into a standard product to customize the product for a specific application.

Motorola supplies the CFBs the designer can integrate into the design as functional units. The company takes responsibility for silicon integration of User Specific Logic and standard CFBs. This allows fast time-to-market by using the ASIC methodology.

The core logic is imported into a gate array environment where the I/O is added. This allows the company to offer fixed die sizes for ease in manufacturing, multiple technologies best suited to the application (performance, size etc.), as well as higher levels of integration.>

In the typical CSP flow, Motorola provides bus level models, timing shells and data sheets for the CFBs, standard cell or gate array libraries, evaluation boards, and applications support. The designer then designs his portion using standard third-party CAE tools, performs functional and timing verification, and then releases a gate-level netlist with a testbench to Motorola.

The final step involves the company optimizing the design with full gate-level descriptions of the CFBs in place and performing final chip integration. This includes releasing the design through Motorola's manufacturing CAD system, and then completing the physical design process, including full-chip verification. The completed part is then released for manufacture.

In addition to design partitioning, the coding style also impacted synthesis. A coding "standard" was established early in the project. It enabled performance and eased the synthesis process. It also ensured RTL- and gate-level simulations matched. The following are the primary areas for the coding style:

  • Outputs registered where possible (the RISC CPU deviated from this approach in places).
  • Collection of similar functions maintained within a module for resource sharing.
  • Avoidance of latch inference.
  • Full I/O control of asynchronous signals.
  • Logic-level minimization.
  • Avoidance of clock-gating.

Design partitioning and modularity provided a very structured approach to functional verification of the SAR. Each functional unit was designed and tested stand-alone using Chronologic's VCS Verilog simulator. Test benches were written with appropriate stubs and drivers to mimic the interface between the block and external logic. Each functional unit was fully verified with a different test before full-chip integration and simulation.

Testing the design Regression Tests tested specific SAR functional units and verified the units' functionality against a complete internal specification for the device. These very predictable behavioral regression tests provide pass/fail status visible on the block's external pins. Some units, such as the DMA incoming and outgoing channels, were tested in parallel using fork-join constructs in Verilog and pre-defined memory maps to configure local memory.

Confidence Tests ensured the SAR meets its specification. These tests are applied at the SAR's top level spec and, in some cases, at the top level of major functional blocks (RISC, PCI, etc.). Confidence tests are pseudo-random in terms of cell traffic patterns and bus availability. They test flow control­critical in this device.

Performance Tests verify that all specified architectural performance metrics are met. Typically, these tests impose all worst-case conditions on the device and monitor items such as major bus usage, memory bandwidth, and RISC performance.

Significant effort was placed on developing a suite of testbenches which tested the SAR device in strict conformance to the ATM/SONET and PCI standards. In addition, other tests are used, including power-on-self-test (a firmware test verifying the state of the device after power-up or reset).

The functional verification was based on Chronologic's VCS simulator and its graphical environment. The team developed Verilog tasks and functions to support software verification of the RTL description. With these tasks non-Verilog literate software developers can use VCS.

This self-contained environment allowed hardware and software designers to perform full design and debug in a fast, robust environment. Since the SAR demanded an exhaustive verification environment, fast simulation time was essential to meet the schedule.

Although VCS was the design team simulator, all SAR Verilog code, including test code, was made portable across simulators. This supports Motorola's Customizable Standard Product methodology of supporting the code at future customer sites.

Memory testing was performed in firmware rather than hardware. A BIST pattern was loaded into the SAR Instruction cache. The RISC CPU wrote to, and read from, each memory. The PCI FIFO's was tested by having a host read and write cell data. The ROM was tested by reading the pattern from the ROM and comparing states.

Gate-level timing simulation was performed on the SAR, but its test suite was not as extensive as applied at the RTL level. The team used a bottom-up compile approach with SAR modules. This approach was driven by design complexity, but often resulted in higher quality than top-down desgins.

Each module was partitioned into sub-blocks. The first-pass compile step had minimal control in terms of flattening, structuring, and resource sharing. Instead, it used best guess estimates for load and drive. A second compile modified the compile control through flattening and structuring. It used more accurate load and drive estimates from first-pass compile results.

Edge rate constraints were tightened for a few critical nets in the design and design rule constraints were relaxed to save area. Area and timing are critical constraints in the 450k-gate SAR. This two-pass compile flow was used early in the synthesis phase. It implemented manual characterization of each sub-block.

Relying solely on Design Compiler was unsuccessful. Cell selection was influenced by the most subtle design constraint changes. The team was careful in time-budgeting and path segmentation, especially on bi-directional signals, to accurately control cell critical path timing selection.

In many cases, Design Compiler blindly extracted external delays from surrounding sub-blocks and annotated them on the design. This resulted in non-optimal logic. The team used compile-characterize-recompile flow successfully in parts of the SAR design, but the team did not use the concept globally.

The process consumed the largest amount of time in the synthesis flow; nevertheless, intelligent control of constraints, accurate path segmentation, and optimal resource sharing yielded excellent results from Design Compiler.

Having completed an initial compile flow for key functional SAR blocks. The team made minor modifications to the scripts. These scripts formed the basis for the synthesis approach.

Critical paths Next, the team modified the RTL to maintain critical paths within a module. The team added additional control to the synthesis scripts to help Design Compiler fix critical path violations. After initial synthesis, the team realized datapath optimization and path grouping needed improvements.

The pipelined 50MHz RISC processor affords little flexibility for adding registers and moving logic to fix timing violations. The processor has cascaded datapath elements with large register-to-register stages in some critical paths.

The RISC pipeline and mode of operation obviated breaking up these paths. Thus, creative datapath compilation and resource sharing were required to meet timing. Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys ' DesignWare capability­specifically, the ALU support­was used extensively.

Without fast carry look-ahead on some of these datapath elements, 50MHz RISC CPU would not be possible. The team saved ten to fifteen nanoseconds in a critical path by using a clf adder in place of a ripple (rpl) implementation.

Since time had been spent on specifying very accurate constraints, the use of the map_to_module directive within the HDL compiler was not required (although this was an option). Accurate constraint setting forced the implementation the team required.

In addition to optimizing RISC processor datapath elements, the team grouped and gave significant weight to certain critical paths ( group_path ). Design Compiler prioritized these path groups ahead of other timing paths in the design to achieve performance goals.

Since Design Compiler had only worked in the logical world, all drive strengths and timing information had been determined from statistical wire loads. The SAR used a very rich set of wire load models in the library. Models ranged from representations of small, die-size regions to larger block-level regions that gave an added level of accuracy over just having one or two wire load models.

Nevertheless, parasitic effects of interconnect were still statistical and did not account for best- and worst-case scenarios. Therefore, the design team moved the layout phase of each block forward in the design flow to provide layout parasitics to Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys at an early stage. Full place and route of the blocks permitted extraction of exact parasitics.

This final step, after datapath optimization and path grouping, provided superb results. The link between Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys and Cascade's place and route tool was extremely efficient. Design Compiler synthesized a module based on the two-pass compile (described earlier), and the resulting netlist was placed and routed.

Parasitics were extracted and annotated onto the Design Compiler database. The team performed a compile_in_place optimization in Design Compiler. It modified drive strengths of critical path devices and reduced drive strengths of devices off the critical path. The former provided design rule optimization; the latter, area optimization. Wire load models removed uncertainty from timing verification.

Although running a full layout during the synthesis phase appears a time-consuming process, it was very reasonable to achieve logical-physical partitioning. Iterations between synthesis and layout were minimal and all timing verification was performed with actual post-layout timing. The hierarchical approach in synthesis and fixed block methodology in Cascade's place and route tool obviated a one-pass full-chip layout.

For example, synthesis and layout of the 200k-gates RISC CPU completed in 24 hours with this flow, and it was batched by the Make design environment. Thus, modified RTL was synthesized, laid-out, and its timing verified (with post-layout parasitics) in 24 hours with no manual intervention. Other functional modules were processed in a similar fashion, giving a three day synthesis and layout turnaround on the 450k-gate SAR.

Test synthesis, that added scan chains to make the SAR testable, was performed immediately after the two-pass compile (described earlier) and before layout. Thus, the layout tool re-ordered the scan chain based on placement, and final in-place optimization occurred on the scan design. During logic synthesis the design team avoided inferring scan flip-flops in the logic path. Allowing the inference would prevent Test Compiler from synthesizing scan chains in the final design. To do so, the Design Compiler variable set_scan_style was set before compile.

To complete test synthesis successfully, the Test Compiler design rule checker (DRC) was run early in the synthesis flow and all violations in the RTL were fixed before final synthesis. This omitted optimizing a design that had to change to fix design rule violations.

Timing Analysis was the primary procedure for verifying SAR timing. Timing Analysis was used in three distinct modes:

  1. Estimated mode within synthesis (DesignTime).
  2. Timing driven buffer sizing (TACTIC).
  3. Final "golden" timing analysis (MOTIVE).

The tools were used at different points in the design to provide a high level of confidence in SAR timing. DesignTime from Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys performed estimated timing during synthesis. In this phase, statistical wire load models represented interconnect parasitic effects. Load and drive estimates from external blocks were placed on the design.

A hierarchical approach to wire load modeling was used to better mimic the floorplanning results that occurred later in the design flow. The synthesis library contained wire load models representative of areas from 0.1x0.1mm2 to 3x3 mm2 in fine increments. Each hierarchy level in the design had a wire load attached to it based on DesignTime estimated block size.

After floorplanning, the RC on most nets in these blocks agreed with the estimates for wire load models. However, the team noticed problems on heavily loaded nets that were either very long or very short. Worst and best case scenarios were not handled well and led to timing problems. First pass synthesis got design timing in the "ball-park" and physical RC information got it close to performance goals.

Timing driven buffer sizing (TDBS) is a Cascade timing analysis tool (TACTIC) feature that lets the designer size buffers on the critical path. TDBS is advantageous, because it changes the actual layout database. After first pass synthesis and floorplanning, the team used TDBS to modify drive of cells in the critical paths that were not optimal after a full place and route. TDBS increased or decreased buffer sizes as necessary and created a new netlist.

Quad Design's MOTIVE timing verifier performed final "golden" timing analysis on a post-layout netlist. The final netlist was released to mask and included all buffer size changes made by TDBS. Since this was the "golden" timing verification phase, statistical inaccuracies from wire load models were not allowed. Actual RC parasitics extracted from the layout tool were annotated onto the design database in MOTIVE using the NETLOAD statements.

In addition to ensuring violation free timing, MOTIVE also reported global slack in the design­a powerful feature of the tool. Using the tool, the design team located nonviolating device pins that contained very little margins.

These pins might cause problems later. Global Slack allowed the design team to find the devices in the layout database and increase their timing margins. The design team found many "tight margins" that were relaxed with minor modifications to the route.

Most design and verification time was spent at the block-level. Having been functionally verified and synthesized, each block was read into the Cascade EPOCH toolset.

Within EPOCH the following functions were performed on each block:

  • Place and route.
  • Floorplanning.
  • Final place and route.
  • Clock trunk generation.
  • Manual global routing of timing critical nets.

For each block, an initial place and route (autocompile) was performed on the Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys netlist. This generated a first pass layout that was then floorplanned in Cascade .

Floorplanning optimizes placement and timing. During floorplanning, groups were moved, aspect ratios of standard cell groups and memories were changed, pins on blocks were moved based on routing, and timing critical global signals were routed manually.

This iterative process occasionally required modifications to the low-level design partitioning. The modifications required re-synthesis or modifications to the hierarchy structure within the Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys database to allow a lower level of resolution in floorplanning. Overall, however, very few iterations back to synthesis were required.

Once all of the blocks had been laid-out, met timing, and fit the desired die size, final chip-level integration was performed. At this stage, each functional block in the SAR was treated as a separate design and linked into the top level.

Thus, block layout remained unchanged after integration. All that remained at this stage was floorplanning of the top-level functional blocks (with respect to timing and pinout), top-level bus routing, and final power-rail-sizing (based on this floorplan and route).

Since the CSP concept involves core competency in Gate Array, the SAR core logic (LEF description), together with the I/O description, was imported into the Gate Array layout tool for final integration. The design was treated as a gate array with a fully diffused core. The team took advantage of faster cycle times in manufacturing and reuse of automatic test equipment hardware.

The SAR provided more challenges to the design flow than first envisaged. The high-performance sections of the RISC CPU and the PCI bus were the key areas requiring the most effort. For example, the CPU gave us little flexibility to perform synthesis.

In addition to the design complexity, the team was severely restricted by both speed and area. The SAR has a rich feature set; thus, the design is large. In addition, because the large design had to run at high speed, the team had to control the synthesis process by trading off area for speed, and vice versa.

The SAR has many unique unrelated functions which forced unique solutions in the design flow. The most critical decisions made early-on in the design cycle proved to be design partitioning, containment of the critical path, and the use of a fixed-block methodology. If a complete simulation and layout of the SAR had to be performed every time a design change was made, this chip would never have been completed.

The capability offered to the design team through a fixed-block methodology was very powerful and made incorporating incremental changes straight-forward and quick.

A great deal of effort was spent early in the design cycle (before RTL) in analyzing the system architecture. The analysis drove design partitioning which, in turn, proved critical.

The CAE tool flow proved very robust­the fact that the team had a minimal number of different tools in the flow reduced the possibility of error in interfacing these tools.

Alan Gibbons is a senior staff engineer at the Customizable Standard Products operation of Motorola's Semiconductor Products Sector in Chandler, AZ. He is currently focusing on the design and development of ATM chipsets, specifically the ATM SAR.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to: michael@asic.com.


integrated system design  February 1996



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]


For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 - Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About