|
special section
This installment of our EDA benchmark series returns to simulation. The benchmarks show that though a 300-MHz PC performs about the same as a 300-MHz Ultra 60 SPARCstation, a 400-MHz PC offers significant performance gains over the pricey Ultra 60--providing evidence of the PC's superiority on one of EDA's most demanding tasks. On such benchmarks as the 1.3-million-gate RISC gate-level simulation, every machine tested surpassed the Ultra 60 in performance--including the Compaq 300-MHz and 400-MHz machines, which you can get for less than a third the price of an Ultra 60. The best PCs, those with Adaptec RAID controllers, can surpass the performance of an Ultra 60 by as much as 70 percent on certain benchmarks. The Verilog-XL simulation benchmarks used in the tests are the same as those we ran in the first installment of the series ["EDA Platform Benchmark: Simulation," March]. We ran them again to correct erroneous results that seemed to give the Ultra 2 a big performance advantage on I/O-intensive tasks. In addition, we took advantage of the opportunity to put the newly available 400-MHz PCs through their paces.
Recycling the benchmarks
Several circuits were large enough to require a significant amount of paging to disk, and two of them were especially I/O bound. One was a behavioral memory model that dumped about 120 Mbytes of VCD waveform data in its short run, and the other was the 1.3M RISC CPU circuit with a data structure size of over 840 Mbytes. The RISC CPU had a fairly simple design replicated 256 times to get 1.3 million gates. On the Ultra 60, the results of the new benchmark runs showed a wide divergence from the original measurements for the two I/O-bound simulations. Before reporting the actual figures, we need to explain several differences between the old and new benchmarking conditions. One change is that the original tests on the Ultra 2 used Verilog-XL 2.6.4, whereas the new tests used version 2.6.7. We believe that the newer version is slightly faster than the old one, so the upgrade should have helped the performance.
The machines
To see how the performance of the two SPARCstations would compare, we ran the simulation benchmarks on the Ultra 60 (using, remember, Verilog-XL 2.6.7) and measured the results using the original method (see Figure 1). The results agreed to within 4 percent in all cases. Only the 5,000-gate RISC behavioral and mixed-level simulations showed a slower time for the Ultra 60, with a divergence of no more than 0.6 percent. The 5,000-gate simulations were so short (with a run time of less than 10 seconds) that the margin of error in measuring the runs probably accounted for the difference. On all the other benchmarks, the Ultra 60 showed superior performance. As mentioned earlier, the upgrade from Verilog-XL 2.6.4 to 2.6.7 might have accounted for some or all of the improvement. Whatever the reason, the results agreed well enough to say that the Ultra 60 qualifies as a reasonable substitute for the Ultra 2. Only one machine remained, then, from the original tests--a Compaq 5100 with a single 300-MHz Pentium II, 512-kbyte L2 cache, 512-Mbyte EDO RAM, dual-bus Reliance Computer memory controllers, and a Wide Ultra SCSI hard drive. It was generally 5 to 10 percent faster than any of the next three fastest PCs used in the original benchmarks, so it represents the best of the 300-MHz class. As reported in the first benchmark installment, the Compaq machine's advantage over the other 300-MHz PCs probably lies in its use of the dual Reliance memory controllers.
The other machines in the new simulation tests were four 400-MHz PCs that all rely on Intel's 440BX chip set with a 100-MHz system bus and 512 Mbytes of SDRAM. We used a Hewlett-Packard Kayak XU, a Compaq Pro-fessional Workstation AP400, and an IBM Intellistation M Pro. The only major differences among the 400-MHz PCs were in their disk controllers and the use of ECC (CRC) on the L2 cache. We dealt with both of those differences in our synthesis benchmarks ["EDA Platform Benchmark: Synthesis," June], in which the Adaptec Array-1000CA RAID 0 controller turned out to be a decisive factor for I/O-bound synthesis runs. The HP and IBM machines use the Adaptec controller, but the Compaq relies on NT to manage the RAID subsystem. The HP and IBM machines both showed a clear advantage on the largest benchmarks. As for the L2 ECC, we've investigated further since the synthesis tests and found that the HP machine has it hardwired off, the Compaq machine has it hardwired on, and the IBM PC allows you to turn it on or off through BIOS adjustments. Having the L2 ECC off gave the machine a slight speed boost, but Intel recommends that it always be kept on to ensure error-free operation. In our new measurements, we quantified the speed advantage obtained by turning the L2 ECC off.
A gift of time
It turns out that Sun hasn't developed some magical I/O channel that handles disk transactions at almost twice the speed of the PC. Our new benchmark runs showed that the Ultra 60 was indeed faster on disk I/O, but nowhere near twice as fast (more on the exact figures later). The erroneous results in the first simulation tests came from the way Verilog-XL measures elapsed time on different platforms. Verilog-XL reports the times taken for compilation, linking, and running. Summing those times seemed a straightforward way to measure the total benchmark time on each platform. The problem is that the C library calls made by Verilog-XL don't measure the same quantities under the different operating systems. Understanding what the times represent requires a look at the various timing alternatives on both Unix and NT.
Alternative times
On the other hand, time was apparently far from the minds of NT developers, who didn't include a high-resolution time command of any sort. The NT Time/T command provides a resolution of only minutes, which makes it dandy as a kitchen timer but inadequate for measuring the small differences we often see from one computing platform to another. For the synthesis tests we ran in our last benchmark installment, we used a small C program on NT to measure wall clock time, but we've since discovered the joys of Microsoft's NT Resource Kit. Among the many useful items in the kit is the TimeThis utility, which returns the wall clock time interval between "go" and "done." Wall clock time is clearly the best measurement for the benchmarks because it represents the actual wait time you experience when running an EDA task. The challenge is to ensure that the wall clock times from both Unix and NT are equivalent. Aside from the fact that the resolution differs among the timing sources (Unix time, Verilog-XL, and TimeThis provide resolutions of seconds, tenths of a second, and thousandths of a second, respectively), the numbers don't always add up. Say, for example, that we tell Verilog-XL to simulate a circuit that doesn't exist and add the Unix time command to see how long the operation takes. We type the following information at the Solaris command line: time verilog my_circuit.v We get this response: 0.0u 0.0s 0:11 0% 0 0k 0 + 0io 0pf + 0w The first 0.0 represents user time (CPU time), the second 0.0 represents system time (the time spent handling operating system overhead tasks), and the 0:11 represents elapsed wall clock time. The first two times make sense because Verilog-XL did nothing with the nonexistent circuit, but where did the 11 seconds of wall clock time come from? That's the time needed to load the executable, get a license, figure out there's nothing to do, and quit. For comparison purposes, consider a similar operation on NT: TimeThis verilog my_circuit.v The response is: TimeThis : Command Line : verilog my_circuit.v
Now there are 43 seconds for which we have to account. It turns out that the overburdened PC on which we ran the experiment took about 43 seconds to get underway and quit. Any of the benchmarked PCs would have taken much less time. To extend the experiment, we ran Verilog-XL with a short script named pause.v: file pause.v The plan was to run Verilog-XL, pause in interactive mode, let it "idle" for about 5 minutes, then issue the verilog command to continue. On Solaris the sequence is: time verilog pause.v
The user and system times using time are still 0.0, and Verilog-XL reports no billing for compilation or simulation times. The 0.1-second link time given by Verilog-XL seemed to have gotten a free ride, which makes sense because there was little to do in the link phase. Nonetheless, the wall clock value from time reported more than 5 minutes elapsed time. Note that the 5-minute pause is inexact because it was timed with a wrist watch. On NT the sequence is: timethis verilog pause.v The interesting part of the experiment is that within a few tenths of a second, the times reported by Verilog-XL on NT add up to the elapsed time given by TimeThis. On Solaris, Verilog-XL reports times that correlate with the user and system times given by time. Unfortunately, those times don't correlate with the total elapsed time experienced by the user. The bottom line: The only way to account for all the time is to use the wall clock times as reported by Unix time and TimeThis on NT.
The new results
The "user + system" values are the sum of the user and system times from Unix time (and don't include disk I/O time). Thus the original time measurements from Verilog-XL should agree with the value for the user + system times, and they do--with the notable exception of the 5k RISC behavioral benchmark. The big discrepancy there comes from the fact that the user + system value doesn't include the time required for loading the application or getting a license token via the LAN. The 5k RISC behavioral run is so short (2.0 seconds for the user + system time) that the time required to load Verilog-XL and get a license overwhelms the actual run time. The "wall clock time" values do include the time taken for such activities, as well as for disk I/O, and thus create some distortions of their own. The spikes in the 5k RISC benchmarks, for example, are potentially misleading. The values differ by so much only because the times for those benchmarks are so short that the load and license times represent a huge percentage of the overall run time. We determined by examining non-I/O-intensive benchmarks that the load and license times totaled about 2.5 seconds. By subtracting that amount from the wall clock time, we generated the "adjusted wall clock time" values that present a more reasonable view of the short benchmarks. For the longer benchmarks, the 2.5-second adjustment had no appreciable effect on the wall clock time. The most important aspect of Figure 2 is the insight it gives into the disk I/O time that was missing in our original simulation tests. The wall clock time in the behavioral memory benchmark was approximately 28 percent longer than that using the original measurement method, and the 1.3M RISC gate-level benchmark time increased by a whopping 62 percent. Our original benchmark article thus clearly gave the Ultra 2 an undeserved advantage in the disk-intensive benchmarks. Figure 3, then, compares the absolute times of the old and new measurement methods of the behavioral memory and 1.3M RISC gate-level benchmarks. The behavioral memory benchmark measured 6 seconds longer under the new system, and the 1.3M RISC gate-level benchmark was almost 52 minutes longer.
SPARCstation and PC meet again
On the behavioral memory benchmark, the Ultra 60 still has about a 20 percent edge over the 300-MHz PC. It even beats one of the 400-MHz PCs on that benchmark. However, the very best results come from the 400-MHz PCs that use the Adaptec RAID controller. The Ultra 60's performance plummets on the 1.3M RISC gate-level benchmark, though. Even the 300-MHz PC surpasses its performance by about 17 percent, and the 400-MHz PCs simply leave it in the dust. Even without the Adaptec RAID controller, the 400-MHz Compaq PC beats the Ultra 60's time by 54 percent--a stunning result for a platform that's less than one third the cost. It's possible that Sun will leapfrog the PC competition by introducing a 500-MHz SPARCstation, so we'll look forward to benchmarking such a system in future installments to find out where Sun will fit on the price/performance curve. Another interesting aspect of Figure 4 is the overall performance comparison among the different platforms. The Ultra 60 and 300-MHz Compaq results intertwine across the benchmarks, giving the impression of approximately equivalent performance. The 400-MHz PCs, though, establish clear superiority. Remember the days when we just assumed that RISC workstations were the fastest platforms around?
Comparing the PCs
The effect of the Adaptec RAID controller is quite noticeable. The improved disk I/O speed boosts the HP Kayak XU and the IBM Intellistation M Pro by significant margins on the behavioral memory and 1.3M RISC gate-level simulations. We don't see the 50 percent advantage delivered on some of the synthesis tests, but the Adaptec controller continues to impress. One other feature of interest is the pair of values for the IBM PC. The "ECC on" and "ECC off" values refer to whether the ECC function for the L2 cache was on or off. As described earlier, the IBM machine is the only PC that allows users to change that function. In the synthesis benchmark installment last month, we noted that having the L2 ECC off seemed to give a PC about a 2 percent advantage, but we didn't have firm numbers to support that estimation. Figure 5 shows that turning off the L2 ECC consistently improves performance--but by an underwhelming 0.7 percent for the simulation tests. That tiny advantage amounts to about an hour's savings on a week-long simulation run, which is clearly insignificant. Consider this: What if the lack of ECC on the L2 cache results in an error during that run, which then cascades through the simulation, rendering the results garbage? We can't justify risking a week's time (or even an overnight run) to gain 0.7 percent. You have to decide whether the L2 ECC issue is critical enough to push you toward IBM, Compaq, or some other PC vendor. But wherever you go for your PC, our advice is this: Demand the Adaptec RAID controller. James Lee is a senior consulting engineer at Seva Technologies, Inc. in Fremont, Calif. He has 12 years' experience working with Verilog and was one of the first employees at Gateway Design Automation, which developed Verilog. Before joining Seva, he was with Cadence Design Systems. He's the author of Verilog Quickstart and is also a part-time instructor in Verilog at the University of California at Santa Cruz. Bob Peterson is a freelance writer based in Monterey, Calif. Formerly the assistant managing editor of EDN, he has written on a wide variety of technical topics for many publications and companies for the past 16 years. To voice an opinion on this or any Integrated System Design article, please email your message to miker@isdmag.com. integrated system design September 1998[ Articles from Integrated System Design Magazine ] [ ICs and uPs ] [ Custom ICs and Programmable Logic ] [ Vendor Guide ] [ Design and Development Tools ] [ Home ] For more information about isdmag.com email webmaster@isdmag.com For advertising information email amstjohn@mfi.com Comments on our editorial are welcome Copyright © 2000 Integrated System Design |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints| RSS|
Digital| Mobile |
| Network Websites |
|
International |
|
Network Features |
|
|
|
All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved. Privacy Statement | Terms of Service | About |