special section
EDA
Platform Benchmark: Simulation and Synthesis II
We revisit the last set of benchmarks, determining which machine simulates and synthesizes better than all the rest--or do we?
by James Lee and Bob Peterson
| |
Using both synthesis and simulation tests, this benchmark installment puts IBM and Compaq PentiumWindows NT systems toe to toe with each other and with a Sun workstation. Though all three machines are worthy competitors, we
will declare one the winner, albeit a qualified one.
The results presented here come from our last round of benchmark tests, in which we didn't compare platforms (see "EDA Platform Benchmark: Simulation and Synthesis at the Same Time," November, p. 50). A software bug led us not only to omit the absolute run times, but also to remove altogether the results of the Sun workstation from the synthesis benchmarks. We thus compared each machine only to itself. This time, we're presenting that benchmark's
results, only now including both the run times (which are comparably skewed) and normalized results. Since such a winner-loser exercise is a little silly, though, given the wide range of considerations that affect your EDA platform choices, we continue to report the caveats and trade-offs involved in the benchmark results.
The overall winner, the IBM Intellistation Z Pro, yielded the best times on over half the tests we ran, performing particularly well on the large simulations. But that doesn't tell the
whole story. The Sun Ultra 60--running 90 MHz slower than the IBM--performed the best on half of the small simulations, though, again, we couldn't compare its performance on the synthesis benchmarks to that of the PCs. Similarly, despite a bug in the BIOS, the Compaq Professional Workstation SP700 ran respectably on all tests and outperformed the IBM on two of the synthesis benchmarks. In short, the three machines aren't that far apart in performance; each offers its own strengths. The choice is up to you.
A year's perspective
Before continuing with the benchmark results in this installment, we want to step back and take a brief look at the way the benchmarks have evolved. We began with the idea to test small, medium, and large tasks that would give a broad picture of the EDA capabilities of PCs running under NT.
Before we ran the first benchmarks, we were guessing about just how small and how large the benchmarks should be, and it turned out that we guessed low. Even the 300-MHz PCs
included in the first benchmark test delivered surprisingly short execution times. Those times rapidly improved as we progressed to 400- and 450-MHz systems.
As a result, we eventually dropped the smallest benchmarks because their short execution times were difficult to measure accurately. Additionally, from the very beginning, we added increasingly larger benchmarks that would provide a greater challenge to the hardware. Amazed at the hardware's capabilities, we pushed the limits of large memory complements
and ran simultaneous synthesis and simulation tasks on dual-processor systems.
We realize that most designers aren't pushing the EDA envelope. At the same time, we believe that all EDA users benefit by seeing the effects of doing so. Even if you don't run simulation tasks that require 1.5 Gbytes of RAM--the maximum used in the tests--benchmarks that challenge 1.5-Gbyte systems indicate the same kinds of trade-offs you face when you challenge 500-Mbyte systems. Equally important, if you're challenging
500-Mbyte systems today, next year you'll be chafing under the limits of 1.5 Gbytes.
What's in a name?
One final item of perspective demands attention: What should we call the systems that contain Pentium processors and run Windows NT? Given the Pentium processor's stunning performance on our benchmarks, Intel deserves a lot of credit, and the company prefers that we call Pentium-based systems "Intel architecture (IA) workstations." (Remember when such systems were "IBM compatibles"?)
We've routinely referred to Intel architecture workstations as PCs, but are they truly personal computers when they run wide-ranging EDA tasks on a network? And in view of the performance turned in by today's best PCs, do they not deserve to be called workstations rather than PCs?
We've sometimes referred to these systems as NT machines to emphasize the nature of the software environment, just as we refer to workstations made by Sun and other companies as Unix machines. Depending on what we want to
emphasize, we will refer to the systems by various terms, usually "PCs." For sheer handiness, it's hard to beat "PC," though some might say that the name is no longer appropriate.
Synthesizing synthesis
In our last benchmark installment, we reported that we couldn't compare the synthesis results of the PCs and Sun workstations because of differences in the versions of Design Compiler on the two platforms. At the time, Synopsys informed us that the run times were skewed, but the synthesis
results were valid.
| Figure 1
| Small simulations
|

Grouping the simulation benchmarks that require less than 512 Mbytes of RAM lets us fairly compare all the hardware platforms over the six small simulations that range from 15 seconds to over 32 minutes (a). Normalizing the results with the fastest machine at 1 shows the relative rankings of the machines (b). The
difference between the fastest and next-fastest machines is usually less than 30 seconds.
|
Synopsys has since clarified the situation. The 98.08 version of Design Compiler that we used for the benchmarks suffered an initialization problem when employing multiple Designware libraries. The software kept looping back unnecessarily, looking for Foundation Library licenses. Though the loop wasn't endless, it was quite long and significantly lengthened run times under some circumstances.
After detecting the run-time problem in internal benchmarks, Synopsys repaired the bug and provided the updated version to customers within four weeks after the bug hit the streets--before the vast majority of its customers even installed the original software, much less encountered slower run times. (Out of 3,000 Synopsys sites, only 18 downloaded the fix.) Unfortunately, the timing for our benchmarks practically invited the bug in for a feast, as we rushed through the narrow window between the
availability of the latest hardware and our publishing deadline.
| Figure 2
| Large simulations
|
|---|

In the 1.3-Mbyte RISC benchmarks, processor speed is critical for the Compaq, IBM, and Sun machines, but frequent pages to disk greatly slow the baseline PC (a). The 800k RISC benchmark, though, demands little more than 512 Mbytes, so here processor speed is the critical factor
for all four systems. Thus the IBM Intellistation Z Pro is the clear winner in all three cases, as the normalized results show (b).
|
It turns out that the same bug infected both the NT and Solaris ports of DC--the 98.08 version was "pathologically consistent," as Synopsys wryly observes. But that doesn't mean that the synthesis run times are comparable across platforms. Our results indicate that the Ultra 60 must take longer to process the operating system calls
associated with the initialization problem. Although not all of the synthesis benchmarks show big performance losses on the Ultra 60, we have no way of determining the extent of the initialization effects. To be fair to Sun, we present the synthesis benchmarks without the Ultra 60. The simulation benchmarks still compare the Ultra 60 to the PCs, however.
Aside from the support associated with the licensing loop, Synopsys says that the company's support staff has seen no unusual activity surrounding the new NT
version of Design Compiler. This version has been in general release only since the third quarter of 1998, but it has apparently raised few immediate integration issues. Synopsys points out that they can't easily tell how many customers are using Design Compiler on NT because the licenses are transparent across platforms. Their efforts may be paying off in hassle-free integration.
Meeting of the machines
The benchmarks tested the following four machines:
- Sun Microsystems
Ultra 60
--two 360-MHz UltraSPARC II processors; 4-Mbyte secondary cache per processor; 1.5-Gbyte RAM; one 9-Gbyte 7,200-RPM disk drive; Solaris 2.5.1
- Compaq Professional Workstation SP700
--two 400-MHz Pentium II Xeon processors; 1-Mbyte L2 cache per processor; 1.5-Gbyte PC100 SRAM; dual 100-MHz memory buses; dual PCI buses; Mylex DAC960 disk array controller with 64-Mbyte cache; three 4-Gbyte 10,000-RPM disk drives configured as RAID 0; Windows NT 4.0, build 1381, Service Pack 3
- IBM
Intellistation Z Pro
--two 450-MHz Pentium II Xeon processors; 1-Mbyte L2 cache per processor; Adaptec Array1000CA disk controller; 1-Gbyte PC100 SRAM; three 9.1-Gbyte 10,000-RPM disk drives configured as RAID 0; Windows NT 4.0, build 1381, Service Pack 3
- IBM Intellistation M Pro (baseline PC)
--two 400-MHz Pentium II processors; 512-Kbytes L2 cache per processor; Adaptec Array1000CA disk controller; 512-Mbyte PC100 SRAM; two 10,000-RPM disk drives configured as RAID 0; Windows NT 4.0, build
1381, Service Pack 3
The Pentium systems on the list fall into two categories, the two faster machines sharing several features. The Compaq Professional and IBM Intellistation Z Pro both incorporate Intel's Xeon processor, which allows the use of the 1-Mbyte L2 cache that both systems utilize. Additionally, the Xeon L2 cache bus operates at the same speed as the processor core. To provide a baseline for comparison, we include the older IBM Intellistation M Pro, a slower non-Xeon system with a smaller
main memory and L2 cache.
We must also note that after we ran the benchmark tests, Compaq informed us of a bug in the prerelease Professional Workstation's BIOS that affects memory access. The existence of the bug explains slower-than-expected results on some of our benchmarks. This isn't the first time we've encountered BIOS bugs in the systems we benchmarked, because the manufacturers usually provide us the systems on a prerelease basis to meet our publishing deadline. The bugs are normally
detected and fixed before we finish a benchmark session, but in this case Compaq must suffer with the results until our next benchmark installment.
| Figure 3
| Normalized synthesis results
|
|---|

The IBM Intellistation Z Pro again turns in the best overall performance in both the small and large synthesis benchmarks, though the Compaq workstation did surprisingly well considering the
BIOS bug in the model we tested.
|
As for the benchmarks described in our last installment, we've kept everything the same (for explanations of the original benchmarks, see "EDA Platform Benchmark: Simulation," March, p. 62; "EDA Platform Benchmark: Synthesis," July, p. 56; and the November benchmark). Last time we eliminated two simulation benchmarks (the Life48 gate-level and behavioral) that were too short to be interesting. We kept the short memory behavioral benchmark,
though, because its large waveform dump sometimes makes it interesting. (As it turns out, it wasn't this time.) For the remaining Life benchmarks, we designed the stimulus to repeat 50 times (like last time) to make the runs take longer than the subminute times we got in our original simulation benchmark tests.
Note again that the Talisman benchmark included here is only a portion (called Hod) of the full Talisman graphics engine. In future benchmark tests we hope to use more of the Talisman Verilog
circuit.
Sim city
In contrast to our previous report on this set of benchmarks, we now present actual simulation run times rather than just relative rankings for each machine. To focus on both mainstream and high-end usage models, we divided the benchmarks into two groups: those that fit in 512 Mbytes of RAM and those that require more memory (denoted here as "small" and "large" simulations). The machines we tested determined the division. With the baseline 400-MHz IBM Intellistation M Pro
containing 512 Mbytes of RAM, we wanted to see how well its processor held up against 400- and 450-MHz Xeon machines that sport 1 or 1.5 Mbytes of RAM. To meet that goal, we had to eliminate memory as a critical variable. The 512-Mbyte limit evened the playing field.
Figure 1a shows the absolute run times for the under-512-Mbyte simulation benchmarks, and Figure 1b presents the results in our traditional normalized format. Even though the 360-MHz Ultra 60 is up against 400- and 450-MHz competitors, it
holds its own surprisingly well--it actually turns in the fastest times on three of the benchmarks. Nonetheless, it turned in significantly poorer times on two of the other three benchmarks, which makes it altogether a mixed bag. The 450-MHz IBM Intellistation Z Pro achieved the fastest times on the other three benchmarks (the baseline 400-MHz IBM machine tied it on the Life128 behavioral test).
Oddly enough, the 450-MHz IBM Xeon machine and the 400-MHz IBM non-Xeon machine yielded almost identical run
times on both the gate-level and behavioral Life128 benchmarks. Life128 represents an unusual case in which link time dominates. Further, although we would normally expect the gate-level simulation to run slower than the behavioral simulation, in this case the reverse is true. The bottom line is that the Life128 database is odd. Since it doesn't play to Verilog-XL's "sweet spot," it's rather unpredictable.
One thing is certain: The Life128 benchmark thrashes memory. The larger cache on the Xeon
processor therefore provided no advantage. The benchmark turned into a memory bandwidth test, producing the tie between the 400- and 450-MHz machines, since both have the same memory bandwidth (100 MHz).
In addition to the odd results on the Life128 benchmark, Figure 1 shows that Compaq's 400-MHz Xeon machine turned in consistently slower times than even the 400-MHz non-Xeon machine. Given the high memory bandwidth we've seen from Compaq workstations in previous tests (thanks to dual memory buses), we
believe that the results reflect the memory-related BIOS bug that Compaq mentioned.
Under the circumstances, declaring an overall winner of the "mainstream" simulation tests becomes a matter of averages. The average normalized result for the Ultra 60 is 93.5 percent, compared with 96.7 percent for the IBM Intellistation Z Pro--so that makes the IBM machine the mainstream simulation winner.
The IBM Intellistation Z Pro also proves the fastest in the large simulation benchmarks--and without a close
challenge from the Ultra 60 (see Figure 2). The outcome is surprising given that the IBM system contains half a gigabyte of RAM less than the Ultra 60. As we've seen in previous benchmark tests, the amount of RAM generally makes all the difference in the large simulation runs, but the extra memory didn't boost the Ultra 60 into the winner's circle this time. Unfortunately for the Ultra 60, it turns out that the benchmarks require about 800 Mbytes of RAM at any given time--more than 512 Mbytes but less than 1.5
Gbytes. The Compaq Professional Workstation SP700 averages about 7 percent slower on the tests--again a clear sign of trouble in BIOSville.
Synthetic results
As we explained earlier, listing the absolute run times for the synthesis tests would make little sense because our 98.08 preproduction version of Design Compiler spent a lot of extra time in initialization loops on some of the benchmarks, hitting the Ultra 60 especially hard. For that reason, we present only our traditional normalized
results for the synthesis tests and don't include the Ultra 60. As in the presentations of the simulation results, the synthesis benchmarks are divided into mainstream and large sizes along the 512-Mbyte memory boundary, but we bundled them all into one graph.
Figure 3 shows the results for both the mainstream (TORCH Dpath and Talisman) and large (rpu256 and Decompress) synthesis benchmarks. After eliminating the shortest synthesis benchmarks in our earlier installments, we found ourselves with only
two benchmarks that fit into 512 Mbytes of RAM. To obtain a broader view, we would clearly have to add more small synthesis tasks.
In the meantime, the mainstream synthesis benchmarks reveal no clear winner, even though the IBM Intellistation Z Pro's 450-MHz processor speed should give it the advantage. The fast IBM system falls down on the Dpath benchmark, yielding to the 400-MHz Compaq Professional Workstation SP700, which in turn falls down on the Talisman benchmark.
On the Decompress benchmark
in Figure 3, the 400-MHz IBM Intellistation M Pro baseline system manages to equal the performance of the top-of-the-line 450-MHz IBM Intellistation Z Pro, even though the latter also contains twice as much RAM. The Decompress benchmark apparently strikes a balance between being memory- and processor-bound on a system with 512 Mbytes of RAM. The benchmark needs a little more memory than that, but not much and not very often (hardly ever, in fact), so the label "large" is a stretch. Like the Life128
simulation benchmark, the Decompress benchmark is a memory bandwidth test. Since both the 400- and 450-MHz machines offer the same memory bandwidth, they perform equally well.
Champion apparent
Obviously, the IBM Intellistation Z Pro wins the benchmark tests in this installment. On average, the Z Pro ran the fastest on both the mainstream and large benchmarks for both simulation and synthesis. Again, the machine wasn't consistently faster on every benchmark, however--even on the mainstream
benchmarks, in which the system's 450-MHz Xeon processor should have been decisive.
We also have to commend the Sun Ultra 60 for keeping up and even getting ahead on some benchmarks with a 360-MHz processor. By the same token, the Compaq 400-MHz system with its BIOS bug did amazingly well compared with the 450-MHz system. All bugs aside, we wonder about the performance versus cost differences between a 450-MHz PC with 512 Mbytes of RAM and a 400-MHz PC with 1-Gbyte RAM.
In the end, when you pick an EDA
workstation, you must deal with all the trade-offs we profiled in our last benchmark installment--performance against buying price against cost of ownership. At least as far as the performance trade-off goes, any of the workstations profiled here will make an excellent choice.
Contributing editor James Lee is a senior consulting engineer at Seva Technologies, Inc. in Fremont, Calif. He has 12 years' experience working with Verilog and was one of the first employees at Gateway Design
Automation, which developed Verilog. Prior to joining Seva, he was with Cadence Design Systems. He's the author of Verilog Quickstart and is also a part-time instructor in Verilog at the University of California at Santa Cruz.
Bob Peterson is a freelance writer based in Monterey, Calif. Formerly the assistant managing editor of EDN, he has written on a wide variety of technical topics for many publications and companies for the past 16 years.
To voice an opinion on this or any
Integrated System Design
article, please email your message to
miker@isdmag.com.
integrated system design January 1999
[
Articles from Integrated System Design Magazine
] [
ICs and uPs
]
[
Custom ICs and Programmable Logic
] [
Vendor Guide
]
[
Design and Development Tools
] [
Home
]
For more information about isdmag.com email
webmaster@isdmag.com
For advertising information email
amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 2000
Integrated System Design
|