United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Emulation Technology for ASIC Core Verification

Emulation strategies vary across three ASIC cores.

By Alan Singletary


In-circuit emulation provides an effective tool for improving first-pass ASIC quality. The IBM Microelectronics team of Austin, TX, put this technology to the test in verifying several complex cores. The emulation approach used by our design team involved partitioning the design into a few large FPGA devices and combining this emulated logic with off-the-shelf devices to create a complete system.

The embedded-controller peripherals group has used the emulation technology described here to develop a number of ASIC cores for the IBM Microelectronics division's Blue Logic library. Used by both IBM designers and outside customers, the cores are built around IBM's Coreconnect on-chip buses, such as the processor local bus (PLB) and the on-chip peripheral bus (OPB.)

To enhance design quality and ensure that our cores meet customer requirements, the design team recently added emulation to our verification strategy. The first set of cores verified through emulation was released to the Blue Logic library in early 1998.

Here, we describe the design flow used to produce the cores as well as a set of factors that may help you evaluate FPGA-based emulation technology. Since we chose emulation/prototyping technology from Aptix Corp. (San Jose), the focus of this discussion is on the issues associated with this technology. Development issues include the need to manually partition a large design into sections that fit within the size and I/O limitations of available FPGAs. As for the expected performance, we found that the technology enables verification of ASIC designs in a realistic system environment, including typical I/O devices and product-level software, at speeds far higher than those available from simulation.

Figure 1 - Comparison of features
Feature Black Box Systems Custom FPGA Board Aptix PCB
Cost - ++ +
Initial Workload ? - +
Ongoing Workload ? ? ?
Execution Speed - ++ +
Complexity, Stability - + ?
In-circuit potential - ++ +
Diagnostic options + - +
Max design size ++ -- -
When comparing alternatives, some factors can be predicted up front, while others will be dependent on the particular design.

What to expect from emulation

Before getting excited about emulation's benefits, it's a good idea to understand the technology's limits and costs - both monetary and otherwise. While the emulation approach we detail here is one of the lower-cost alternatives, there's significant equipment cost and engineering effort involved. That engineering effort will include some amount of emulation-specific work from the ASIC design team and may also include the need for small changes to the design itself.

It can also be difficult to understand exactly what portions of a design are being exercised in an in-circuit emulation setup. While the design team gains a great deal of confidence seeing the emulated design run with real software and peripherals, the coverage achieved may be narrow in terms of the ASIC's functional space. Finding, isolating, fixing, and validating some types of bugs can also be more time consuming than with traditional simulation tools. The bug problem is mainly due to the complex interactions between the in-circuit hardware and software, and the logic under test. Additionally, the problem is exacerbated by the enormous number of clock cycles involved. In many ways, emulation has more in common with system-level bring-up and debug than with simulation test cases. For these reasons, emulation should be viewed as an effective complement to simulation and not a complete verification plan in itself.

Despite these considerations, hardware emulation enables many unique verification and development activities. The sheer number of cycles available through emulation and the unpredictability of a realistic system environment work well in verifying real-world scenarios that simulation can't address.

In addition, behavioral-model bugs that showed up in our emulation have led to improvements in the simulation coverage for some of our current projects. The ability to see our designs run at 10 percent of actual speed allowed early firmware development and provided an aid in bringing up the actual ASIC implementation. The emulation system also acted as a platform for executive demos.

Evaluating emulation technologies

The embedded-controller peripherals group evaluated three different emulation approaches. Although we considered a wide variety of factors in choosing an emulation architecture, our choice was dominated by a few simple metrics (see Figure 1).

First, we looked at the black-box systems that automatically partition designs across large numbers of FPGAs or custom processors. The main strength of this architecture lies in its ability to handle extremely large designs, but the cost of these systems was judged to be prohibitive. We were also concerned that their rlatively slow execution speeds could limit the potential for in-circuit operation with a wide range of standard peripherals.

Second, we considered breaking the design into small chunks and emulating each chunk using a custom circuit board with FPGAs. This approach has been used successfully on other projects at IBM and offers the advantages of low cost and high execution speed. A drawback is that FPGA I/O boundaries must remain fixed because the interconnect between the FPGAs is hard-wired. We had neither the time nor the confidence to deal with the uncertainties of predicting the I/O boundaries of a design that was still evolving.

The third choice was the MP4 field programmable circuit board (FPCB) from Aptix (San Jose, CA). This 24x18 board has plug-in space for Aptix FPGA modules - standard components or any sort of custom circuitry. External devices can connect to the board directly to the plug-in area or through an I/O connector area on one edge of the board. The key to this architecture lies in the four field programmable interconnect component (FPIC) devices in the center of the board. These SRAM-based devices can be quickly programmed to provide connections between any two or more of their 936 I/Os. The connections appear electrically as a passive resistive-capacitive load, with delays of 5 to 15 nanoseconds typical for a two-point connection.

As with the custom-board approach, the technology requires manual partitioning of the design across multiple FPGAs. But because the connections between FPGAs are made through the programmable FPICs, the I/O boundaries are flexible.

How emulation works

All emulation strategies contain the same basic steps, but a close look reveals significant differences in hardware architectures, design tradeoffs, and target applications (see Figure 2). In all cases, the ASIC's hardware design language (HDL) code combines with emulation-specific libraries in a compilation process to produce an image appropriate for the emulation platform.

Figure 2 - Characteristics for comparison
Specific criteria for each step of the emulation process needs to be considered.

For the emulation effort to be effective, the emulated logic must be functionally equivalent to the logic running in simulation and to the final chip. The Aptix software flow ensures this equivalency by drawing from the original design databases. Each FPGA device requires a wrapper file that instantiates the necessary HDL modules from the database. The FPGA wrappers can also include any required interface logic, such as bi-directional drivers.

The connections between the FPGAs are defined in an emulation top-level file - a pure netlist with no additional logic. If you include any devices other than FPGAs on the emulation board, you must also create HDL modules to define the I/O configuration of these devices.

The next step in the emulation flow is to run both the design files and the FPGA wrapper files through a standard synthesis process, targeting the appropriate FPGA technology. The Aptix flow supports most synthesis tools as well as FPGAs from three of the largest vendors (Altera, Xilinx, and Lucent). We found that typical synthesis times using Synopsys' (Mountain View, CA) FPGA Compiler or FPGA Express were one to two hours per FPGA.

After synthesis, the design flow moves into several steps. The software works with the design's FPGA-specific netlist files, a top-level netlist defining the connections between the FPGA netlists, and a pinmap file describing the physical characteristics of the FPGAs and other devices in the setup. Beginning with a mapping process, the software checks the input files for consistency, determines the clocking scheme, and fixes the FPGA pin locations.

The software then creates constraint files for placing and routing the FPGAs. At this point, the software invokes vendor-specific tools for the target FPGAs. If the vendor tools place and route all devices successfully, the final step is to produce a routing for the circuit board. These steps generate binary files for the FPGAs and the Aptix FPCB that you then proceed to download to the FPCB.

This design flow entails some complexity and potential for error because design files pass back and forth among several different software packages. While all of these interfaces are fairly well defined, the programs have little or no visibility to each other. Maintaining exact coherence requires careful work, as a change in any one component usually affects the others.

Emulating three cores

We began our emulation project in the third quarter of 1998 with three cores for use in the embedded arena with the PowerPC 603 CPU: a PowerPC 60X processor local bus memory controller interface; a 64-bit memory controller with synchronous DRAM and ROM controllers; and a 32-bit PLB-to-PCI interface (see Figure 3). These cores required five large FPGA devices for emulation.

A primary goal of our emulation effort was to exercise the cores in as realistic a system environment as possible. To help meet this goal, the emulation setup implemented the cores as part of a complete, featured system (see Figure 4). The system's CPU and DRAM resided on custom circuit boards plugged into the Aptix board, while the emulated cores (in several FPGAs) provided the bridge between these components and the PCI bus. We modified a standard mother board from an IBM network computer to connect via the PCI bus and to provide access to PCI and ISA peripherals, including a network connection.

Figure 3 - Design prior to emulation
The 32-bit PLB-to-PCI interface was targeted, as part one of the three cores used, for emulation.

Although this emulation setup was complex, it was accessible and compact. Cabling the PCI bus to the I/O card proved to be something of a mechanical challenge, as the interface involved two different FPGAs and numerous circuit boards. Synchronization of clocking and reset signals across the environment was also performed and was aided by the flexible clocking architecture of the platform. A dedicated routing network on the board distributes clocks with minimal skew.

We connected a wide variety of standard I/O components into the emulation setup with largely successful results. While some tests had to be modified because of timing dependencies, the majority of I/O peripherals operated well at frequencies of 4 to 6 MHz.

We achieved two major milestones in using the I/O peripherals. The first was the ability to interface to a standard Ethernet network and load the executable image of our test software from a remote host. This operation demonstrated a substantial amount of burst traffic from a PCI master into system memory and also provided the ability to update and reload the test image quickly and easily.

The second milestone came when we ran two PCI master devices simultaneously under our test suite.

The devices competed for access to main memory and generated CPU snoop and write-back activity.

Successful operation for long periods of time in this mode gave us significant confidence in the integrity of the core designs.

Hardware considerations

At the time of this project, the single most significant characteristic distinguishing the Aptix emulation architecture from others on the market was the requirement to partition the design manually into modules suitable for implementation in individual FPGAs. In some emulation approaches, software tools partition the entire design across many smaller FPGAs without user intervention.

The manual partitioning process has some advantages, including division along natural boundaries, but the process requires engineering time and insight. Some designs may challenge the available partition size, which depends on FPGA resources. While FPGA gate counts and speeds are growing at impressive rates, the number of I/Os isn't. This situation is a concern because I/O count can become a partitioning bottleneck. However, the 400 to 440 I/Os available in today's largest standard FPGA packages usually allow straightforward partitioning along core or module boundaries.

If a core doesn't fit in a single FPGA, partitioning can be difficult. One of our cores, a 32-bit PLB-to-PCI bridge, was too large for a single FPGA in terms of both logic and I/O. The first difficulty was to find an acceptable boundary for splitting the core between two FPGAs. Then we had to create several emulation-specific files and modify other files to group the lower-level HDL modules into FPGA "packages" with the correct I/O types. Because of these changes, we lost the ability to pick up the latest source code automatically and we needed special emulation logic drops. When we added a module to provide an asynchronous interface to the PCI bus, we had to duplicate portions of the interface because the PLB-to-PCI bridge was in two parts. Debug was more difficult with the bridge in two parts, and since both FPGAs ended up with near 100 percent I/O utilization even after the split, wiring out internal signals for debug was difficult. Despite the engineering effort, we were eventually able to successfully run this core in emulation and verify its functions.

Note that one benefit of passing signals between FPGAs is that the signals become easily available for debug purposes. In the system, all of the signals that pass through the FPIC devices are available to an extra set of FPICs that connect to the pods of a Hewlett-Packard (Palo Alto, CA) logic analyzer. The software allows you to select the signals to probe and automatically programs the logic analyzer channels via serial or Ethernet interfaces.

Another hardware issue with the technology involves signal delays and system operating frequencies. Delay considerations include combinatorial logic delays within the FPGAs, the time required for a signal to travel across FPGA boundaries, the number and length of delays through the FPICs, and the longest routing topology of any single net. In the best case, a net requires only one FPIC path, and typical delays might yield operating frequencies of 10 MHz or more. At least a few nets may route through two FPICs, however, and the added delay can slow the operating speed. The worst practice is to drive a signal from one FPGA and through another before the signal is registered in a third FPGA, with FPIC connections between the FPGAs. Fortunately, the 1,024-pin FPIC device provides enough routing paths that, combined with good partitioning, you can normally avoid this situation.

If routing delays prevent a design from running at the desired frequency, you can bypass the FPICs and route critical signals directly, either within the Aptix FPCB or through external cables. Register-to-register delays within the FPGAs may impose speed limits, however, especially as the devices near capacity. Timing estimation tools in the FPGA place-and-route packages can provide good delay estimates that allow some degree of optimization.

While each design must be considered separately, we believe that emulation frequencies of 2 to 4 MHz will almost always be possible with this approach.

Careful partitioning and optimization can usually push speeds over 10 MHz. To keep these speeds in perspective, remember that a 4-MHz system speed is tens of thousands of times faster than workstation-based simulation.

Hardware/software integration

To test our emulated cores and other components, we ported two pieces of code and modified them to run in emulation. With the boot firmware for the I/O motherboard, for example, the modification consisted of removing some timing dependencies. This firmware served as a platform for testing our SDRAM controller core, and the emulation platform's relatively high speed made it easy to perform a test that would be impractical in simulation: varying the memory controller timings. We generated likely problem scenarios and then automated their execution with every supported combination of SDRAM timings. We flushed out several obscure problems using these tests.

The majority of our testing used an internal IBM hardware exerciser program called Bring-Up Driver (BUD), which has been used for years to verify PowerPC based chips and systems. Using BUD, we were able to stress the design significantly by running exercisers for all of the peripheral devices concurrently for long periods of time.

Our emulation project continued for approximately six months in an evolutionary process. We added new design functions and new tests until just before the design was released to layout. In retrospect, this process was probably not the most effective use of emulation on either end of the design cycle. When the logic is in an immature state, isolating bugs in the emulation environment is usually not the most efficient method because of the high number of variables involved. Toward the end of the design cycle, it would be beneficial to freeze the logic for several weeks of regression testing in emulation before release.

Unfortunately, neither of these ideals are practical, as bring-up of the in-circuit environment and system software requires that you start emulation early in the design cycle, and design freezes have a way of thawing. The good news is that emulation's high speed can help ease the schedule crunch toward the end of the design cycle because the emulation system can run a comprehensive set of regression tests in a few hours. Simulation regression would normally take much longer.

Emulation thus proved to be an effective complement to our simulation coverage. Our emulation system allowed us to find several serious bugs. These bugs included startup problems such as those that occurred only when the logic was run for thousands of clock cycles before executing any register setup. We also found conditions that didn't induce errors in our simulation behavioral models but did cause failures with a real device.

Additional bugs included memory controller problems associated with the timing of refresh cycles. The emulation environment represented an unusually stressful situation for these operations because DRAM refresh occupied a higher percentage of the memory bandwidth than it would on a system running at nominal frequencies. In some cases, detection of these problems led to improvements in the simulation environment to better address the timing issues.

The silicon payoff

For any ASIC verification effort, the judgement is out until the chips are in, and the verdict was good for our core development project. In the first silicon implementation (the IBM CPC 700 Memory Controller and PCI Bridge), we found no significant problems in the cores, and the first-pass chips were shippable. As an added bonus, the software porting and test development done on the emulation platform turned out to be directly applicable to bring-up, giving us a quick validation of the design quality. As a result, we had the first chips running our BUD system software within three days.

After this success, our team is using the same emulation approach on a more ambitious project involving five new cores and 400k gates in emulation. We believe that the high execution speeds and excellent in-circuit opportunities of FPGA-based emulation offer enormous potential for verifying ASIC cores before fabrication, so long as expectations are reasonable and the necessary planning is done in advance.


Alan Singletary is an advisory engineer with IBM's Microelectronics Division in Austin, TX. He has been involved in a wide range of projects including robotics, system design, and ASIC validation.

To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to mikem@isdmag.com


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About