United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

asic design

Hardware Emulation Accelerates HDL Functional Verification

Very large evaluation suites require more cycles than simulators can provide.

by Tom Balph and Wilson Li



With the current state of silicon technology, engineers are faced with designing complete systems, consisting of multiple functional blocks, like embedded cores, firmware, and peripheral hardware, on a single piece of silicon. Extremely competitive markets and customer demand have forced system designers to deliver higher-performance designs with greater functionality in shorter cycle times.

Functional simulation and verification of HDL is a major bottleneck during the product design cycle. Although some RTL tools, such as VCS from Chronologic, can improve simulation performance as much as seven times over classic Verilog-XL, hardware emulation using Quickturn's hardware and associated tools can provide performance improvements required for processing intensive designs like audio and video MPEG decoders.

Recently, we used hardware emulation to speed up verification of an MPEG-2 audio and video decoder designed to meet the requirements of DVD players. We found that emulation is indeed justified if absolute real-time performance isn't required, simulation is just too slow, and the design complexity is significant. The Quickturn emulation hardware and tools require an additional design flow (much of which can be done alongside the traditional design flow), and sufficient resources must be dedicated to it. Training and knowledgeable people familiar with the Quickturn tools also are required. Further, be aware that compilations and recompilations (such as inserting internal probes) can be time-consuming, and sufficient workstation performance is required to handle them in a timely manner.

Figure 1 MPEG-2 audio/video decoder

The MPEG-2 decoder is a large and significant design that requires emulation to reduce the verification time.

As with so many things, early attention to a good design flow and methodology (implying proper knowledge of the tools) will minimize potential problems with memories, test vectors, and clock strategies for which several system clocks are required. When using a Verilog synthesis flow, pay special attention to on-board memories. Such memory models can complicate the emulation process, because logic "wrappers" must be placed around them so that their behavior matches the memory models of the target silicon. Also, since Quickturn test vectors exercise the top level of the emulation model, generating test vectors that look like vectors for "broadside" testers, while paying careful attention to setup and hold timing, can minimize problems within the Quickturn environment.

Figure 2 Simulation times for Dolby conformance suite under different methods

The audio decoder must be verified against a large suite of Dolby conformance bit streams. Emulation reduces the required verification run time to hours.

When running an emulation at some fraction of the actual real system clock rate, be aware of potential problems with the emulation support hardware: Real DRAM or SDRAM requires a proper refresh strategy at the lower emulation speeds. Data buffers may be susceptible to overruns or underruns due to the changes in relative data rates that result from emulation. Before debugging the emulation system, check the data interfaces to the emulation model first, then concentrate on the internal model.

The MPEG-2 decoder

The decoder is composed of three main modules: a host interface and embedded processor, an MPEG-2 and MPEG-1 video decoder, and a multimode audio decoder (see Figure 1).

The embedded processor accepts MPEG-2 program stream packs, MPEG-1 system stream packs, or elementary stream packs (for data support from an MPEG-2 transport demultiplexer in a set-top box application). It extracts system information, parsing audio, video, subpicture, and user data to correspondent buffers; checks for syntax errors; performs bit stream buffer management; interprets host commands to the submodules; and controls the synchronization between audio and video decoders. The host and compressed data interface work in conjunction with the embedded processor to process the compressed bit stream input. The host interface allows the external host processor access for configuration and control of on-board registers and also allows the host read and write access to the external SDRAM.

The MPEG-2 video decoder can process either an MPEG-2 or MPEG-1 video elementary streams. It provides full MPEG-2 video compatibility. Trick play modes are also supported (play, stop, pause, continue, step, fast forward, freeze, and slow motion). The on-screen display (OSD)/still-picture module allows the device to display 4-bit/pixel OSD data through a color look-up table (CLUT) or 16-bit 4:2:2 decoded data for still-picture display. The subpicture decoder generates both subtitle and full-screen menu overlays for the DVD player.

The video generator section mixes video, subpicture, and OSD/still picture together and generates CCIR-656 output video (4:2:2). It also performs 3:4 and 9:16 conversion, as well as vertical and horizontal filtering.

The multimode audio decoder generates serial digital audio output from AC-3 compressed audio data, MPEG compressed audio data, or linear PCM. External to the decoder, a host processor provides overall setup and control. The DMA block provides a byte-wide interface, supplying the raw bit streams that are decoded for video and audio playback. Intermediate levels of buffer storage are provided by the 16-Mbit SDRAM.

Simulation time problem

Since the decoder uses a hybrid approach to provide a cost-effective yet flexible decoding solution for the DVD player, both programmable and hardwired circuits are employed. We used a programmable core to parse the program stream and to implement play control commands from the external host processor. The video decoder includes a microcontroller and hardwired circuits that implement functions required for DVD, like trick play modes, subpicture decoding, and MPEG-2 and MPEG-1 video decoding. The multimode audio decoder is based on a small programmable RISC core and has several functions implemented in dedicated hardware, such as the CRC circuit.

Because of the complexity of multiple programmable cores and the firmware involved, intensive testing and verification are necessary in order to ensure that all of the components work properly, both independently and together.

In addition to functional verification, compliance with standards is another critical part of the design. The multimode audio decoder supports Dolby AC-3, and Dolby offers a test suite for AC-3 audio that the design must pass to attain certification. The DVD Consortium also provides a test stream disc for testing DVD requirements, such as bit stream syntax, different audio modes, audio and video synchronization, subpicture data, and aspect ratio conversion. To determine MPEG compliance, there are numerous long bit streams for testing purposes.

The simulation environment was similar to the target system. In simulation, a host interface is used to program all the internal registers and monitor status registers, a compressed bit stream is fed into the chip with a dummy DMA controller, a 16-Mbit synchronous DRAM is connected for data storage, and data is captured as audio and video output for analysis.

Figure 3 Simplified design and emulation flow

The emulation process can be done in parallel with part of the normal design flow, but it's an added function in the overall design flow.

With more than 250,000 gates and more than 50 kbytes of RAM or ROM in the decoder, the RTL simulation clock speed is only a few cycles per second on a SPARC 20 workstation. At that speed, it takes 120 machine-years to run the Dolby conformance suite just once, assuming no problem was found (see Figure 2). The gate-level simulation is even slower--465 machine-years. For RTL simulation, we used Chronologic to speed up the simulation. It runs about six to ten times faster than Verilog-XL, but it still requires a significant amount of time (12 to 20 machine-years).

DVD recently released its test stream disc. The total bit stream is about 65 minutes of real-time audio and video. With Verilog-XL simulation on a SPARC 20, it takes 807 machine-years to complete one test cycle. Even with a new UltraSPARC running Chronologic, it would take 29 machine-years to finish.

Clearly, simulation alone was impractical for functional verification and standard compliance testing. Previous experience led us to choose Quickturn emulation to speed the functional verifications. Using a clock of 1 to 4 MHz, emulation cut verification time to days from years.

Emulation process flow

In our normal design flow, RTL is written in Verilog, which is then synthesized to the target silicon circuit and process technology. The design is simulated at both the RT and netlist levels for debugging and verification. Adding emulation to the mix requires a separate flow, but much of it can occur in parallel with the normal design flow (see Figure 3).

First it's necessary to produce a gate-level netlist by performing an additional synthesis that targets a library with no timing or area constraints. If no internal memories are used in the design, the process is very straightforward. However, large designs typically have internal memories (variations of both RAMs and ROMs) that are particular to a targeted technology. The emulation process must install emulation memory blocks that use the basic RAM and ROM models provided by Quickturn. The design's complexity thus increases, because logic "wrappers" must be placed around the memory models so that their behavior matches that of the models in the target silicon.

Second, Quickturn allows for a set of test vectors that can be fed into the tools for hardware debugging. These test vectors can be generated during verification simulations on the gate-level netlist. However, generating the test vectors in a format suitable for the tools requires some special effort.

In the actual Quickturn environment, the gate-level netlist is first imported into the tools. After generating any required memory blocks, the design moves into compilation, where clock information is generated and hardware partitioning and I/O information, as well as any other constraints, are added. The compilation actually maps the netlist into a large number of Xilinx FPGAs located on circuit boards; programs the interconnections among the FPGAs (including clock distribution and other crucial signals); and positions, places, and routes the individual FPGAs on the board.

The output of the compiler is then ready to be downloaded to the hardware for debugging and emulation. Up to this point, the Quickturn tools are running on the host workstations and network.

The vector debugging area of the emulation process imports the vectors provided by the host simulation tools and translates them into a format for the Quickturn environment. If the vectors are prepared properly, the process can be a simple one. The hardware can now be cabled, set up, and checked by means of vector debugging. This step verifies both the hardware setup and the functionality of the compiled database.

The hardware should then be ready for in-circuit emulation. The hardware is cabled to the target system, where it will run at a typical maximum clock frequency of 1 to 4 MHz. The target system provides the primary clock (or clocks), and the Quickturn environment looks like a top-level black box, or I/O footprint, of the emulated function. The tools provide a logic analyzer function that is a tremendous aid to debugging any problems. The top-level I/O and internal probe points can be monitored with the logic analyzer. If problems occur, internal probe points can be added by recompiling the database. This process can turn into an iterative loop while identifying and isolating any problem areas in the design.

Debugging and emulation experience

After generating the compiled placed-and-routed netlist, the actual hands-on lab work begins. Working through the tools, the cabling between the target interface module (TIM) and the host must first be put in place. After the hardware configuration is checked, the TIM is connected to the user-specific target system, from which the actual emulation of the design is controlled and monitored (see Figure 4).

Figure 4 Total emulation system

The emulation box is networked to the workstation environment. Target hardware is connected to the emulation box to provide control, feed compressed bit streams to the decoder, and capture expanded results.

The Quickturn tools help expedite the entire process by providing connection lists as well as a diagnostic tool that actually tests the connections and shows any incorrect connections.

Running vectors

The vectors can now be run against the actual hardware. Vectors are normally used to verify operation and to test the interface and operation of any embedded memories. It's possible that you can use the vectors to actually generate some desired simulation results, but normally the results are driven by the target environment.

If vectors aren't generated properly, they can cause some problems during testing. Initially, we had some problems with setup and hold times. We solved them by using a strobed set of vectors generated in Verilog, paying special attention to all of our clocks and setup and hold times (that is, inputs were always stable across positive clock edges).

Connecting the target system

The target system is connected to the TIM through I/O cables. Once vectors have confirmed the functionality of the database in hardware, the hardware is released to the actual interface cables for in-circuit emulation. The target system won't normally be running at actual system speed, unless a maximum system speed of less than several megahertz is normal.

The purpose of our setup was to feed compressed bit streams to the MPEG-2 decoders and capture the resulting expanded results. We used a PC to provide control to the emulation environment and also supply the bit streams (see Figure 4 again). A PCI board in the PC had custom logic on an FPGA that could read data to and write it from the target microprocessor interface and also supply a buffered byte-wide data stream to the target DMA interface. The PC also had a custom software debugger (written in C) that allowed us to control the target (setup and confirm registers) as well as the bit streams. One reason that we used a PC was that we needed large-capacity disk storage to provide the long video bit streams.

The target board provided the physical connections to the cables, clock source (multiple clocks were required), hardware system reset, simple interface logic blocks, and power supply connections. The clock source was a PLA designed as synchronous counter and driven by a single clock generator. The clocks were on a separate cable and terminated to provide minimum skew and good signal integrity.

Residing on the target board was the 16-Mbit SDRAM that the MPEG decoder required. Use of any DRAM presents a special problem to the emulation environment because the DRAM is running at "real time," which in our case was the emulation clock speed (1 to 4 MHz). The refresh strategy for the SDRAM had to be changed from the actual target design to compensate for the much lower clock rate used in emulation (the memory clock in real system usage was 81 MHz).

Downstream from the target board was a capture system for video data called Viewstore 6000. This box can accept video streams in CCIR656 format at low speed, store them on a disk, and then display the video frames in real time.

Debugging the system setup

The first area we debugged was an area of wrapped-wire connections on the TIM. The signals from the target board are connected to the TIM by standard cables. The cable signals are then mapped to specific I/O signals on the TIM logic boards by compilation. In turn, the user must wire-wrap between sets of pins on the back of the TIM box to complete this mapping.

The Quickturn tools provide not only listings to do the wire wrapping, but also actual procedures that will electrically test the connections. Once the wire wrap is done, further recompilations are constrained to retain the same wire-wrap mapping.

In our particular environment, the next debugging step was to review the SDRAM refresh operation. The SDRAM controller on the MPEG-2 decoder entered an initialization sequence and then started refresh cycles on a regular basis after reset was exited. We observed this function with the logic analyzer to verify that reset was occurring (and releasing) properly and that the SDRAM controller was alive. We couldn't verify reads or writes at this point, because the MPEG-2 decoder wasn't yet programmed for operation.

Following our initial evaluation of the SDRAM refreshing, we had to debug the PC interface. Getting the total environment to read from and write to registers correctly, as well as send data reliably to the target system, consumed much of our early debugging time. Only preliminary debugging of the PC components (both software and hardware) was possible before integrating them into the emulation environment. We found some software errors, FPGA logic errors, and timing problems between the PC and emulator by an interactive process. The logic analyzer with internal probe points compiled into the design was critical to eliminating the problems.

After we could reliably read from and write to the control registers, we could then read and write data indirectly from or to the SDRAM. The SDRAM accesses were also debugged through the PC interface.

The final external interface was the video data collection to the Viewstore 6000. The CCIR656 interface normally runs at 27 MHz, but in emulation it ran at about
2 or 3 MHz. We did have some problems capturing data at this very low rate, but they were unrelated to the environment.

To debug the video output, we created a small secondary design to drive the video. This design was a simple color bar generator that would display known good outputs, and in this manner we were able to isolate any problem strictly to the video output interface.

We tested all the external interfaces (microprocessor, DMA, SDRAM, and video) one by one and made sure that they were functional. After eliminating any problems relating to the support equipment and interfaces, we were free to focus on the real issue--verifying the correctness of the core decoder.

Evaluating the core model

We then ran tests on bit streams that were only several frames long. The results were captured on the Viewstore and then displayed on an actual monitor through the Viewstore. We were thus able to evaluate the results of the MPEG video decoding function. By the time we were getting reasonable results and could move to running longer and more extensive bit streams, the project was deferred because of a higher-priority program. As a result, we can't offer conclusions about problems we found during emulation.


Tom Balph is a member of the technical staff at the Motorola Semiconductor Sector in Tempe, Ariz. He has over 20 years of experience in high-speed logic design, arithmetic processing, and bus design. As a team leader, he works on system-level design and architecture for consumer applications.

Wilson Li is a member of the technical staff at the Motorola Semiconductor Sector in Tempe. He recently moved to the United States from Hong Kong and China, where he worked on consumer applications and MPEG-2 design.

To voice an opinion on this or any Integrated System Design article, please email your message to miker@isdmag.com.


integrated system design  April 1998



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com email cam@isdmag.com
For advertising information email amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 2000 Integrated System Design

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About