United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



Design Automation

Putting Multi-Threaded Behavioral Simulation to Work

Multi-threaded behavioral simulation was a research topic in the 1980's. In the 1990's it has come of age and is ready to go to work.

by Lionel Bening


Multi-threaded behavioral simulation on parallel computer architectures is no longer a research topic­it is now ready for production. However, the following fundamental rules need to be observed for efficient multi-threaded simulation:

  • Minimize the serial and maximize the parallel.

  • Balance the parallel threads across parallel CPUs.

  • Scale the number of parallel CPUs in proportion to number of parallel threads.

Behavioral simulation has many serial components. The maximum rate at which the host computer software and hardware can start and stop parallel threads contributes some serial overhead. In addition, the simulation software has its own serial operations, including inter-thread communication, input-output, and time management.

The best performance results from "balanced" multi-threaded simulation across available CPUs. The extent to which parallel CPUs finish their threads early and are idle detracts directly from the simulation speedup. Even if a multi-threaded simulation has minimum serial overhead and the threads are balanced, the question remains of how well it will "scale" as more parallel CPUs are added. For example, if a simulation model has four long parallel threads, using more than four CPUs will provide little, if any, speedup advantage.

HDL drives synthesis At the Convex Technology Center (Richardson, TX), we use behavioral HDL system design for all system simulations. The behavioral HDL drives the synthesis tool to produce gates. If it does not provide the gates we want, we create them by hand and use boolean equivalence checking to compare the behavioral HDL against the hand-laid gates. Synthesis or boolean equivalence checking ensures that the behavioral system simulation model matches the gate description.

The ASIC design boundary We perform a multi-threaded simulation of our ASIC using our C models. We simulate communications between ASICs using the vendor host simulator. Two current ASIC technology attributes make the ASIC a natural boundary between serial and multi-threaded simulations: (1) port limitations and (2) the large amount of logic on each ASIC. The ports of each ASIC become the interface between our models and the Verilog simulator. By communicating to the simulator through events at the ASIC boundaries, multi-threaded simulation performance benefits from the designers' fitting their designs in the ASIC package port constraints. By minimizing host simulator event communication, simulation overhead is reduced. In addition, more of the simulation run-time is spent in multi-threaded execution of machine instructions derived from the ASIC models. The designers' efforts to push ever-increasing amounts of logic onto newer ASIC technologies help put more logic into the multi-threaded simulations.

In-house Verilog-to-C translation Our designers simulate system designs using Verilog HDL models. As system simulations exceed available host machine cycles, our designers use in-house translation tools to automatically convert a hierarchy of many Verilog modules (that represent each ASIC design) into a single C model for each ASIC. Our in-house C models simulate over 5 times faster than the fastest vendor-compiled Verilog HDL ASIC models.

Programming techniques The Verilog simulator executes serially. It makes serial calls to ASIC C model instances when triggered by clock-edge events. If the designer translates the Verilog to C using the translator's "parallel" option, the C model does not execute when called. Rather, it adds a scheduling notice to the "thread queue." The scheduling notice specifies an entry point to the C model (a pointer to its instance storage) and a code indicating the clock-edge event that initiated the call from the host simulator.

To simulate the models in a multi-threaded simulation, we add a special thread.v Verilog model to the system simulation. This thread.v model calls a "run thread queue" C function 1 picosecond after the controlling clock edge(s) that called the model. The run thread queue initiates multi-threaded calls to the C models according to the scheduling notices in the thread queue.


Figures 1 through 4. The graphs in Figures 1 through 4 present the speedup for 2, 3, 4 and 6-threaded simulations over single-threaded simulations. Using the simulator-simulator, we varied ASIC sizes and counts and ran 72 simulations to develop timing data for each graph. The graphs are consistent with our experience. For example, note how speedup dips at 16 ASICs for the 3-threaded in Figure 2. The dip shows how multiples of four on the "ASIC count" axis results in one ASIC left over to run serially when the other 15 ASICs run in three parallel threads that are each 5 ASICs long. The 6-threaded simulation shows similar speedup dips where ASIC counts are not a good fit for 6-threaded simulation. The conclusion from the graphs is that system designs of more and larger ASICs are suited for multi-threaded simulation. This is truer as the number of CPU's applied to running more simulation threads increased.

There are three main "Critical Sections" in the C model that must be executed serially:

  • Where the C model calls the simulator to pass changed states at outputs. The simulator does not know that the C models beneath it are executing in parallel. If the C models call the simulator while in a multi-threaded manner, output states get scrambled between the various ASIC instances.

  • Where commodity chip models in C operate on a shared functional memory model.

  • Where the C model instances make calls to do file output, such as for the $display system task in the original Verilog.

  • Fortunately, designers use these system tasks only for very exceptional cases, so the breaks from multi-threaded operation for file output are very rare.

Developing common sense To move multi-threaded parallel simulation into production, we apply it where it best fits the system simulation model. This requires experience in running a variety of simulation model sizes.

To grow this experience fast, we developed a "simulator-simulator." With this tool, we vary the sizes and the number of ASICs in each run, and time each run with microsecond resolution. In the simulator-simulator, the multi-threading managing software is the same as that used in the real logic simulator. The simulator-simulator omits inter-ASIC events, but still models actual simulation timings quite closely.

Table 1
Cost comparisons
  One Uniprocessor 4-CPU Multi-threaded Four Uniprocessors
Memory $10,000 $10,000 $40,000
Licences $25,000 $25,000 $100,000
Total Cost $35,000 $35,000 $140,000
Simulation Completion Rate 1 every 120 minutes 1 every 40 minutes 4 every 120 minutes
Cost-up vs speedup In "Cost-Effective Parallel Computing," ( COMPUTER , February, 1995), Wood and Hill point out that the speedup from multi-threaded computing may not increase in proportion to the number of CPUs added, yet still be cost-effective. When running 4 simulations independently on 4 uniprocessor hosts, they point out that each host needs sufficient memory.

When running each simulation, one at a time, on 4 CPUs of a multi-threaded host, only one memory image of each simulation is needed at a time. Thus, multi-threaded simulations reduce the memory cost to a fourth of that needed for simulations on 4 uniprocessors. In addition, just as with memory, we can use each vendor simulation license more effectively by running simulations multi-threaded on a 4-CPU host (compared with running the simulations on 4 uniprocessors).

For example, a simulation model that runs in 120 minutes using 200 megabytes of memory on a uniprocessor host, runs three times as fast in the same amount of memory running multi-threaded on a 4-CPU host.
Behavioral simulation modeling definition
Since "behavioral" appears in the title, and different designers may have different interpretations of what it means, a few words about "behavioral" level simulation modeling as it is used here are in order. In a behavioral HDL model, a designer...

1. Uses more procedural programming type HDL language elements:

  • Case statements In one of our designs, a single case...endcase procedural block consists of over 5000 source lines.
  • If-else statements.
  • Procedural assignment statements The procedural assignment simulates faster than the non-blocking or signal assignment.

2. Operates on multi-bit variables.

3. Uses little or no gate-level cell instantiations.

Table 1 shows the same price for a multi-threading and single-threaded simulator license, but shows no volume discount for multiple licenses. It is reasonable for a simulation vendor to charge more for the multi-threading simulator . However, to sell the multi-threading feature in its simulator, the simulation vendor must allow the customer to share significantly in the advantage of multi-threading.

Here are the ideas about multi-threaded simulation that I'd like to leave with you:

  • The ASIC boundary is a good place to partition a system design for multi-threaded simulation.

  • A simulator-simulator can be a useful tool for predicting which system simulations will perform best when run multi-threaded.

  • System designs consisting of more and larger ASICs get the best performance speedup from multi-threaded simulation.

  • Consider cost of memory and licenses when looking at the speedup factor due to multi-threading.

References

1. D. A. Wood and M. D. Hill "Cost-Effective Parallel Computing" COMPUTER , Vol. 28, No 2, Feb. 1995, pp. 69-72.

Lionel Bening is on the CAD technical staff at the Convex Technology Center of Hewlett-Packard (Richardson, TX). His work there focuses on evaluating, integrating, and augmenting HDL and simulation tools.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to: michael@asic.com.


integrated system design  February 1996



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]


For more information about isdmag.com e-mail marcello@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 - Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About