|
Design AutomationPutting Multi-Threaded Behavioral Simulation to WorkMulti-threaded behavioral simulation was a research topic in the 1980's. In the 1990's it has come of age and is ready to go to work.by Lionel BeningMulti-threaded behavioral simulation on parallel computer architectures is no longer a research topicit is now ready for production. However, the following fundamental rules need to be observed for efficient multi-threaded simulation:
Behavioral simulation has many serial components. The maximum rate at which the host computer software and hardware can start and stop parallel threads contributes some serial overhead. In addition, the simulation software has its own serial operations, including inter-thread communication, input-output, and time management. The best performance results from "balanced" multi-threaded simulation across available CPUs. The extent to which parallel CPUs finish their threads early and are idle detracts directly from the simulation speedup. Even if a multi-threaded simulation has minimum serial overhead and the threads are balanced, the question remains of how well it will "scale" as more parallel CPUs are added. For example, if a simulation model has four long parallel threads, using more than four CPUs will provide little, if any, speedup advantage. HDL drives synthesis At the Convex Technology Center (Richardson, TX), we use behavioral HDL system design for all system simulations. The behavioral HDL drives the synthesis tool to produce gates. If it does not provide the gates we want, we create them by hand and use boolean equivalence checking to compare the behavioral HDL against the hand-laid gates. Synthesis or boolean equivalence checking ensures that the behavioral system simulation model matches the gate description. The ASIC design boundary We perform a multi-threaded simulation of our ASIC using our C models. We simulate communications between ASICs using the vendor host simulator. Two current ASIC technology attributes make the ASIC a natural boundary between serial and multi-threaded simulations: (1) port limitations and (2) the large amount of logic on each ASIC. The ports of each ASIC become the interface between our models and the Verilog simulator. By communicating to the simulator through events at the ASIC boundaries, multi-threaded simulation performance benefits from the designers' fitting their designs in the ASIC package port constraints. By minimizing host simulator event communication, simulation overhead is reduced. In addition, more of the simulation run-time is spent in multi-threaded execution of machine instructions derived from the ASIC models. The designers' efforts to push ever-increasing amounts of logic onto newer ASIC technologies help put more logic into the multi-threaded simulations. In-house Verilog-to-C translation Our designers simulate system designs using Verilog HDL models. As system simulations exceed available host machine cycles, our designers use in-house translation tools to automatically convert a hierarchy of many Verilog modules (that represent each ASIC design) into a single C model for each ASIC. Our in-house C models simulate over 5 times faster than the fastest vendor-compiled Verilog HDL ASIC models. Programming techniques The Verilog simulator executes serially. It makes serial calls to ASIC C model instances when triggered by clock-edge events. If the designer translates the Verilog to C using the translator's "parallel" option, the C model does not execute when called. Rather, it adds a scheduling notice to the "thread queue." The scheduling notice specifies an entry point to the C model (a pointer to its instance storage) and a code indicating the clock-edge event that initiated the call from the host simulator.
To simulate the models in a multi-threaded simulation, we add a
special
Figures 1 through 4. The graphs in Figures 1 through 4 present the speedup for 2, 3, 4 and 6-threaded simulations over single-threaded simulations. Using the simulator-simulator, we varied ASIC sizes and counts and ran 72 simulations to develop timing data for each graph. The graphs are consistent with our experience. For example, note how speedup dips at 16 ASICs for the 3-threaded in Figure 2. The dip shows how multiples of four on the "ASIC count" axis results in one ASIC left over to run serially when the other 15 ASICs run in three parallel threads that are each 5 ASICs long. The 6-threaded simulation shows similar speedup dips where ASIC counts are not a good fit for 6-threaded simulation. The conclusion from the graphs is that system designs of more and larger ASICs are suited for multi-threaded simulation. This is truer as the number of CPU's applied to running more simulation threads increased.There are three main "Critical Sections" in the C model that must be executed serially:
Developing common sense To move multi-threaded parallel simulation into production, we apply it where it best fits the system simulation model. This requires experience in running a variety of simulation model sizes. To grow this experience fast, we developed a "simulator-simulator." With this tool, we vary the sizes and the number of ASICs in each run, and time each run with microsecond resolution. In the simulator-simulator, the multi-threading managing software is the same as that used in the real logic simulator. The simulator-simulator omits inter-ASIC events, but still models actual simulation timings quite closely.
When running each simulation, one at a time, on 4 CPUs of a multi-threaded host, only one memory image of each simulation is needed at a time. Thus, multi-threaded simulations reduce the memory cost to a fourth of that needed for simulations on 4 uniprocessors. In addition, just as with memory, we can use each vendor simulation license more effectively by running simulations multi-threaded on a 4-CPU host (compared with running the simulations on 4 uniprocessors). For example, a simulation model that runs in 120 minutes using 200 megabytes of memory on a uniprocessor host, runs three times as fast in the same amount of memory running multi-threaded on a 4-CPU host.
Table 1 shows the same price for a multi-threading and single-threaded simulator license, but shows no volume discount for multiple licenses. It is reasonable for a simulation vendor to charge more for the multi-threading simulator . However, to sell the multi-threading feature in its simulator, the simulation vendor must allow the customer to share significantly in the advantage of multi-threading. Here are the ideas about multi-threaded simulation that I'd like to leave with you:
References
1. D. A. Wood and M. D. Hill "Cost-Effective Parallel Computing"
COMPUTER
, Vol. 28, No 2, Feb. 1995, pp. 69-72.
Lionel Bening is on the CAD technical staff at the Convex Technology Center of Hewlett-Packard (Richardson, TX). His work there focuses on evaluating, integrating, and augmenting HDL and simulation tools. To voice an opinion on this or any Integrated System Design article, please e-mail your message to: michael@asic.com. integrated system design  February 1996[ Articles from Integrated System Design Magazine ] [ ICs and uPs ] [ Custom ICs and Programmable Logic ] [ Vendor Guide ] [ Design and Development Tools ] [ Home ] For advertising information e-mail amstjohn@mfi.com Comments on our editorial are welcome. Copyright © 1996 - Integrated System Design Magazine
|
||||||||||||||||||||||||||||||||||||||||||||
Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints| RSS|
Digital| Mobile |
| Network Websites |
|
International |
|
Network Features |
|
|
|
All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved. Privacy Statement | Terms of Service | About |