United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



EDA Platform Benchmark: When Should You Upgrade EDA Hardware?

The choice of an EDA platform is becoming both easier and harder--easier because the new machines perform well, and harder because the tradeoffs are tough to gauge.

by James Lee and Bob Peterson

While we would all love to do our work on the most powerful workstation available at any given moment, corporate
engineering life is hardly known for that sort of indulgence. Faster processors blessed with bigger chunks of DRAM come to market hourly, it seems, but no matter how well you argue the case for improving your design productivity with another 100 MHz of processor power, you may have to aim your arguments at next year's budget.

Nonetheless, your productivity probably does depend on getting a new workstation from time to time, if not every year. So the questions are: Upgrade now? If not, when? How do you know when a new machine will pay its way?

The answers lie in this installment of our EDA platform benchmarking series. (For previous installments in this series please see March 1998, p. 62, July 1998, p. 56, September 1998, p. 44, November 1998, p. 50, and July 1999, p. 68.) To get useful answers, we tested a wide range of EDA applications, and we benchmarked high-end platforms as well as more typical systems. By comparing the run times for the typical and high-end platforms, you can see how much time you might save every day if you had a workstation with a faster processor or more memory.

We more-or-less arbitrarily pegged our typical system at 256 Mbytes of DRAM. Our basic assumption is that the average designer possesses that much memory in a Unix or NT system (or both) on his or her desktop, while systems with more memory tend to be available as back-office machines. Although we routinely show the value of large memory capacities in these EDA benchmark tests, even big system-on-a-chip (SOC) designs usually involve many designers working separately on relatively small blocks of the chip. Our 256-Mbyte memory should accommodate the average synthesis and simulation tasks for these SOC blocks as well as the average 200,000-gate chip design.

New to this round of benchmark tests are two PA-RISC-based HP-UX machines that turn in exceptionally strong results if you use a configuration option called chatr to set the virtual page size larger than the default. In the same vein, we have found a number of non-name-brand NT machines that ship with the disk interface's Ultra DMA option turned off; you can more than double these machines' disk I/O performance simply by turning Ultra DMA on.

The persistence of memory

The primary characteristic of our typical workstation is the amount of DRAM included (256 Mbytes). Memory has been a continuing and provocative theme in these platform benchmarks. In case you are just joining us, earlier benchmarks have shown that a big chunk of memory can buy a great deal of performance for large simulation and synthesis tasks. The key is to have enough DRAM to accommodate the task you are running so you avoid paging to disk. Even with a good hardware RAID system, the performance hit from paging is staggering. Investing in a gigabyte or so of DRAM is well worth the cost--especially at today's low prices--if your tasks have a big memory footprint.

Not all EDA tasks have a big memory footprint, even though it sometimes seems that way. The benchmarks in this installment and the previous one involved place-and-route tasks that were suitable for Snaketech's Cellsnake place-and-route tool (Snaketech U.S., San Jose), which handles designs that today are considered relatively modest in size--100,000 to 200,000 gates. Our typical 256-Mbyte system is perfectly adequate for these jobs. As shown by our recent design effort at Seva/Intrinsix Corp. (Fremont, Calif.), a 200,000-gate design requires less than 256 Mbytes of DRAM for simulation.

You face a different situation when you run a big EDA task. Simulating a 1.3 million-gate design overwhelms a 256-Mbyte system, for example. In one test that we ran on a non-name-brand PC, the CPU utilization dropped to about 6 percent, while the disk worked up a sweat trying to meet paging demands. At 6 percent CPU utilization for a 400-MHz processor, we reached the equivalent performance of a 24-MHz processor. That leaves a lot of megahertz sitting around with nothing to do.

Note in this regard that NT by default sets a system's virtual memory only slightly higher than the size of the physical memory. This default is a good idea because it causes a too-demanding task to blow up immediately rather than thrashing the disk for several days. When you experience the blow up, you can check out the mismatch between the memory and the application's needs and decide whether to change the default or move to a system that has more memory.

On big EDA tasks, bigger memories put idle megahertz back to work, and you can place much bigger memories inside workstations than ever before--up to a point. While gigabyte chunks of DRAM are readily available, you have to consider how many of those chunks you can actually put on a workstation. Even 64-bit Unix workstations aren't built to take advantage of anything near their full 64-bit memory-addressing space, but 18 billion Gbytes is a capacity to which only the foolhardy aspire--for a few more years, anyway. Within that upper limit, workstation vendors can easily expand memory capacity beyond the 2- to 4-Gbyte capacities currently possible. But what of 32-bit Windows NT systems?

The NT question

In every one of our benchmarking installments, we have posed the thorny question of whether Windows NT is okay for EDA. The question involves performance, reliability, convenience, and memory issues. Time after time, the benchmarks have shown that NT certainly passes the performance test, judging by the relative performance of the same task running on roughly equivalent NT and Unix systems.

Figure 1 - Simulation performance
By picking out the three system architectures included here--represented by HP-UX, NT, and Solaris machines--you can see performance patterns emerge. The best-performing machine on each benchmark gets a value of 100 percent, and the other machines are ranked by their performance relative to the best machine.

Our experience has also shown that NT is more reliable than many designers think. Our informal tests have shown that an NT system can run a single EDA task for weeks without slow-downs or crashes. The typical benchmarks we run give us shorter time windows in which to judge NT's stability, but we did run a long simulation in the current benchmark tests (the disk-thrashing exercise on the non-name-brand PC mentioned earlier). After 394 hours--more than 16 days--the simulation completed successfully. In other NT tests, we have run several rounds of simulation/synthesis/place-and-route benchmarks over four days of continuous running and had no stability problems.

The only NT crashes we have seen in our benchmark tests have come from specific errors in the way we set application options or from license problems. We've experienced absolutely no NT crashes of mysterious origin in the EDA tests.

Figure 2 - Performance of typical machines
Assuming that a typical desktop EDA system has 256 Mbytes of DRAM, we grouped the benchmarks that require less memory than that and show the results here for several desktop machines. Most of these systems contain more than 256 Mbytes of DRAM, but the extra memory makes no difference to benchmark performance.

We have seen stability problems on NT machines in other instances, however. In our very first round of EDA benchmark tests, a PC from a large mail-order company randomly produced incorrect results, though we ascribed this problem to an unknown assault that had left a hole in the side of the shipping carton. A replacement machine handled the benchmarks adequately, but then immediately succumbed to the blue screen of death. Some of the benchmarks place considerable stress on the systems, and only this NT machine crashed in the process--clearly a vendor-specific problem.

The only other mysterious NT-related stability problem we've seen is with an office machine at Seva/Intrinsix. When this system is running an instant-messenger program and several network drives are mounted, X windows spontaneously begin to close and network drives unmount. Closing the instant-messenger program eliminates the problem. We can speculate that since all these services are running over TCP/IP, the problem results from poor TCP/IP stack management. However, we haven't confirmed that diagnosis.

As noted in the last benchmark installment, we believe that a lot of the reliability problems associated with NT actually come from office applications rather than the OS itself. If you run both the office and EDA applications on the same platform--one of the attractions of an NT system--you probably take little comfort in knowing that Verilog-XL wouldn't crash by itself in the seventh hour of your 8-hour simulation.
Figure 3 - Summary of typical machine performance
Based on a collection of small benchmarks that are good candidates for running on a desktop system, this chart shows that the 500-MHz Compaq SP700 and the 550-MHz IBM Intellistation Z Pro achieve the best overall performance.

At the end of the day, no matter what applications you run, you have to hope that the OS will keep a mere spreadsheet or instant-messaging program from pulling system resources out from under your EDA task. Rather than just hoping for this level of performance, however, consider utilizing the Perceptive User Workaround: Simply close applications that you find hostile to EDA tasks. Unfortunately, this approach deflates the benefit of having a single workstation that handles both office and EDA tasks from your desktop, one of NT's primary selling points.

We can make a list of other NT shortcomings, including the need to add a lot of extra software to gain Unix-equivalent convenience. Many people have also noted that NT could benefit from a more robust file system, among other features. In fact, Gary Smith, principal EDA analyst at Dataquest, Inc., said at this year's Design Automation Conference, "Microsoft has not come out with any of the feature sets needed for the OS in EDA." That's rather overstating the case, as you know if you have seen the impressive EDA benchmark results that NT platforms routinely exhibit, but we understand what Smith means. Convenience counts, and Unix has it. Moreover, EDA users have invested billions of dollars in Unix hardware and software, and companies such as Sun and HP have done an extraordinary job of making that investment bulletproof.

Microsoft has also done an extraordinary job of persisting in pursuing targeted markets. As Ajay Sikka, Microsoft's marketing manager for engineering, commented at DAC, "Our focus is to show we're in it for the long run." But does that long-run focus hold a bit of irony for a 32-bit OS? After all, a 32-bit memory space has the inherent "gotcha" of a 4-Gbyte memory limit. Even if you don't put that much physical DRAM in a 32-bit system, you can't normally push the virtual memory space any higher than the address space allows, so you can't rely on paging to disk to save you. And what is this talk about NT having a 31-bit address space?

Expanding the NT memory space

The math implies that the biggest memory you can theoretically couple with a 32-bit OS is 4 Gbytes. It turns out, though, that NT4 on an Intel platform has a maximum user space of 2 Gbytes, which represents a 31-bit address space. Thus, only 2 Gbytes is directly accessible to an application. NT's developers kept an extra bit for themselves, devoting 2 Gbytes of additional address range to system space that is accessible only to the Windows NT executive software. We currently use the NT4.0 sp3 release for running these benchmarks.

Figure 4 - Performance of high-end machines
This chart shows the relative performance of the most powerful systems we tested on our largest EDA benchmarks. HP's 440-MHz J5000 HP-UX system turns in a stellar performance, aside from a severe let down on our one large place-and-route benchmark, the Talisman hod.
The vast majority of EDA jobs will continue to fit nicely inside a 2-Gbyte space for another year or so, but we are already seeing tasks that can make full use of 2 Gbytes of DRAM. Fortunately, the PC memory-space picture is more complicated than NT4's 2-Gbyte limit. The NT Server 4.0 Enterprise Edition already provides a flat
virtual address space for applications to grow to 3 Gbytes--a limit that all but the largest designs will find roomy for the time being.

It's also theoretically possible to exploit Intel's extended server memory architecture, which allows applications to access more than 4 Gbytes of memory. The architecture encompasses a new 36-bit mode (page size extension, PSE36), an existing 36-bit mode (physical address extension, PAE36), 36-bit caches, and chip sets that support greater than 4 Gbytes of memory (450NX). The Intel PSE36 driver provides access to memory above 4 Gbytes (up to a total of 32 Gbytes) as a RAM disk via an API that leverages existing Win32 APIs. This scheme therefore avoids the performance hit of paging to a physical disk, but an application's code base must be modified to utilize the additional memory.

Figure 5 - Summary of high-end machine performance
Looking exclusively at large benchmarks that are appropriate for running on an EDA server, this chart shows that HP's 440-MHz J5000 offers superior performance despite running at a slower processor speed than the other two machines.

Windows 2000, the successor to Windows NT, offers a better approach based on PAE36 for 32-bit Intel processors and extended addressing support for Compaq's Alpha processors. Microsoft's address windowing extensions (AWE) API set opens up a 64-Gbyte memory space for the Intel architecture and a 32-Gbyte space for the Alpha. Under this scheme, all the physical memory in the system can be treated as general-purpose memory. The operating system can use this memory for caching and virtual memory management without major changes.

Applications can also use the memory above 4 Gbytes, without changes, but only in 4-Gbyte chunks. Applications that require more physical memory than is provided by the 4-Gbyte virtual address space can use the AWE APIs to map from one designated window into the physical memory region allocated for that window. Microsoft claims several advantages for the AWE approach. The AWE memory allocations are both finer grained and faster than they would be for a complete process space creation or fork. Microsoft describes the mapping process as exceptionally fast due to direct use of the underlying hardware's native capabilities.

Windows 2000 will have even more native hardware capabilities to play with when Intel's 64-bit Merced processor makes its way to the real world. Windows 2000 is said to support 64-bit addressing on the Merced and Alpha platforms. That should make NT's memory limitations a fading memory before the limits have any impact on your EDA tasks.

Get set right or go slow

Hardware imbalances were on our minds as we watched our non-name-brand PC slog through its 394-hour simulation disk-thrashing encounter. Seva/Intrinsix purchased this 400-MHz, 256-Mbyte NT machine from a local computer store to handle some office chores. Out of curiosity, we ran some of the EDA benchmarks on the system. The machine's 256-Mbyte memory was certainly a factor in the marathon simulation run, given that a PC with 512 Mbytes of memory completed the simulation in less than an hour. But we also found enormous differences in the non-name-brand PC's disk I/O performance compared to the EDA-ready NT machines we get from Compaq, HP, and IBM. While a 500-MHz Compaq SP700 took only 13 seconds to run our behavioral memory benchmark (a short simulation that involves dumping a 120-Mbyte VCD file to disk), for example, the generic system took 46 seconds. The results reflect a huge disparity in run times.

Our first thought: Here's a good reason to buy name-brand hardware. The major workstation vendors tune their systems for high performance. The generic vendors may not target such specific users.

Our second thought: How could the generic machine be so slow? We immediately suspected that the system wasn't using Ultra DMA because this IDE disk I/O option enables burst transfers at twice the normal rate. If the option were turned off, that would account for most of the performance shortfall.

Our local PC vendor sent over a tech-support person who confirmed our suspicion. After turning on the Ultra DMA, we found that the generic machine ran the behavioral memory benchmark in just 19 seconds--a 142 percent improvement! Doubling the DMA burst rate more than doubled the overall performance on the benchmark because Verilog-XL didn't have to sit idle waiting for disk I/O as frequently. On the much larger simulation test, this machine spent the vast majority of its marathon run swapping data to and from disk. Using Ultra DMA probably would have eliminated at least nine days from the 16-day total run time.

After this experience, we checked two other NT machines in the Seva/Intrinsix offices (neither of which has seen use in EDA benchmarking) and found that both have Ultra DMA disks but neither was enabled to use Ultra DMA. Windows NT has Ultra DMA turned off by default, so it pays to check. Bear in mind that if Windows crashed and you reinstalled the OS, you might have gone back to the default non-Ultra-DMA setting.

To check on your disk I/O, look in the NT control panel under (counter-intuitively) "SCSI adapters" then examine at the properties for the IDE controller. If you are running an Ultra DMA drive, the Ultra DMA option should be enabled. Incidentally, Windows 95 and Windows 98 also ignore Ultra DMA by default. If you use one of these OSs and you have an Ultra DMA drive, then look in the device manager under "disk drives" and look at the properties for your Ultra DMA disk. Make sure the check box for the "DMA" option is selected.

Because our curiosity was piqued, we tried out some benchmarks on a non-brand-name Celeron-based system and got a number of odd failures on the 1.3 million-gate simulation. It turned out that the PC vendor had overclocked this system via the BIOS. When we reduced the clock speed a bit, the 1.3 million-gate benchmark ran fine. We recommend this highly demanding benchmark to PC users as a way to verify that a machine can produce valid results under stressful conditions. Otherwise, how do you know?

Set HP-UX for lots of chatr

If you run any big EDA jobs under HP-UX 11, another adjustment that can give you impressive performance gains: increasing the size of virtual memory pages. The command you need is chatr.

Before we benchmarked the 400- and 440-MHz PA-8500 workstations that HP contributed to our EDA tests, the company told us to expect these machines to outperform NT machines at similar processor speeds. We didn't find the expected performance advantage--until we used chatr to increase the virtual memory page sizes from the default values to 4 Mbytes and 1 Mbyte for data pages and instruction pages, respectively. The command line is:

chatr +pi 1M +pd 4M verilog.exe

where +pi and +pd tell chatr where to set the instruction and data page sizes and verilog.exe is the targeted executable. A performance expert at HP gave us these values, so we were operating on faith here.

After we set chatr, the workstations' performance on large simulation tasks improved dramatically. On the 1.3M RISC benchmark, for example, the 440-MHz system improved by a whopping 45 percent (the largest gain we saw). HP told us to expect a 40 percent overall improvement after using chatr, and while we didn't see an average gain quite that large, the improvement was still impressive. Across all the big simulation tasks (four designs, each involving more than 256 Mbytes of data), the 440-MHz systems improved by an average of 30.25 percent after using chatr.

The bigger virtual page sizes clearly boosted performance on the large simulation tasks, but didn't hurt performance on the smaller tasks. One short task took a fraction of a second longer after using chatr, but that result was within our margin of measurement error. On average, using chatr improved the small simulation tasks by 3 percent.

Why doesn't Cadence just ship the HP-UX port of Verilog-XL with the page sizes set higher? The Verilog-XL version used in these benchmarks is actually for HP-UX 10.20 and we were using HP-UX 11. The two OS versions are binary compatible, but version 11 obviously takes good advantage of the large page sizes.

The chatr performance gains weren't as pronounced on the synthesis and place-and-route benchmarks. After the 440-MHz machine's 30 percent average improvement on simulation, the gains of approximately 10 percent on synthesis and 5 percent on place and route were a little disappointing. Still, even a 5 percent improvement is significant if it's free. The HP performance expert didn't know what chatr values to use for Design Compiler and Cellsnake, so we simply applied the same values used for Verilog-XL. We didn't have time to experiment with the values, but we would love to know if tuning would produce a bigger advantage for the HP-UX machines on synthesis and place and route.

A wide range

This round of benchmark tests included two of the high-end PCs spotlighted in our recent place-and-route benchmarks (see July 1999, p. 56), the two HP-UX machines, and a collection of machines from other benchmarking tests. Here are the pertinent details for this installment's featured machines:

  • IBM Intellistation Z Pro--Providing the fastest processor clock speed at 550 MHz, this NT system includes dual-Pentium III Xeon processors, 2 Gbytes of DRAM, and an IDE RAID controller (in contrast to the SCSI RAID controller included in previous IBM systems). Note that this is the same IBM system we tested in the last benchmark installment, but to match the configurations of the other systems in those tests, we had removed one processor, slowed the processor clock to 500 MHz via the BIOS, and removed half the DRAM. For the tests in this round of benchmarks, we restored the Z Pro to full strength.

  • Compaq SP700--With the second fastest processor clock speed at 500 MHz, this NT system has one Pentium III Xeon processor, 2 Gbytes of DRAM, a Milex SCSI RAID controller card, and a 512-kbyte L2 cache. This machine also has dual memory buses and dual PCI buses.

  • Hewlett-Packard Visualize P500--HP was the only vendor to answer our request for a "typical" NT engineering machine with 256 Mbytes of DRAM. We could declare this 500-MHz machine as the performance winner in the "typical" category by default, but in the interests of extracting some useful information, we compared the P500 to other machines on smaller benchmarks that don't challenge the 256-Mbyte limit.

  • Hewlett-Packard Visualize J5000--Along with two PA-8500 processors running at 440 MHz, this HP-UX machine includes 2 Gbytes of 120-MHz SDRAM. The 64-bit PA-RISC processor has an on-board 1.5-Mbyte cache. (It's interesting to note that as a result of HP's partnership with Intel in developing the latter's Merced architecture, the PA-8500 promises binary compatibility with the upcoming 64-bit Intel processor.)

  • Hewlett-Packard Visualize C3000--This HP-UX machine relies on a single PA-8500 processor running at 400 MHz. The system contains 1.2 Gbytes of 120-MHz SDRAM.

In addition to those high-end machines, we included several other systems to represent various classes of system performance. In a couple of cases, these systems are left over from earlier benchmark tests. We are identifying the older systems generically to avoid any chance that they could be taken to represent a manufacturer's current technology. At the same time, we want to thank Compaq and IBM for generously allowing us to continue using their older systems to provide a baseline against which to measure the performance of newer machines. Thanks also to Hewlett-Packard for providing the "typical" PC configuration we requested for this round of benchmarks. Here are the system class representatives:

  • 300-MHz PC--This NT machine with 512 Mbytes of DRAM was a star performer in our benchmarks just two years ago. It's the only machine remaining in our tests that has a 66-MHz front-side bus.

  • 400-MHz PC--This NT machine has 512 Mbytes of DRAM and dual processors, but we haven't seen any significant advantage from the latter in these single-threaded benchmarks. The machine thus represents the 400-MHz PC class pretty well.
    Figure 6 - Simulation performance summary by class
    This chart summarizes the results shown in Figure 1 by averaging each machine's performance across all the simulation benchmarks. By picking out a class of machine that's similar to the one you use, you can see how your simulation performance might improve if you upgrade to a faster machine.

  • Sun Ultra 60--This Solaris machine from the previous benchmark tests contained a pair of 360-MHz UltraSPARC II processors and 1.5 Gbytes of DRAM. Any attempt to "generic-ize" this system would be futile, so we include it with the caveat that it doesn't represent Sun's best workstation offerings. We ran no new tests on the Ultra 60 in this round of benchmarks, and newer workstations weren't available to us for testing.

The benchmarks that ran on these machines were as varied as the machines themselves. All the benchmarks were described in our previous EDA benchmark reports, and you can find details at www.isdmag.com/edabenchmark . The simulation benchmarks used Cadence Verilog-XL and included Behavioral Memory, Life 128 Gate Level, Life 128 Behavioral, 800k RISC Behavioral, 1.3M RISC Behavioral, 800k RISC Gate Level, 800k RISC Mixed, 1.3M RISC Gate Level, and 1.3M RISC Mixed. The synthesis benchmarks used Synopsys Design Compiler and included Torch Dpath, Talisman, RPU 256, and Decompress. The place-and-route benchmarks used Snaketech Cellsnake and included CTS, DES, PIC, and Talisman hod.

With workstation memory capacities growing at a phenomenal rate, our most data-intensive benchmark of 800 Mbytes no longer pushes the memory envelope for many machines. In future tests, we will introduce benchmarks that break the 1-Gbyte boundary and range towards 2 Gbytes of data.

Simulation results

As we've added machines to our benchmark tests, we've dealt with increasingly complex results. Figure 1, for example, shows the simulation performance for eight machines. For each of the benchmarks, every machine's performance is plotted against the performance of the fastest machine (charted at 100 percent).

At first glance, the chart seems wildly chaotic--some machines improve on a specific benchmark, while others decline. But on closer examination this chaos actually reflects a great deal of order, especially when you distinguish the different system architectures from one another. The most startling aspect of the results is the dominance of the HP-UX machines. On nearly all the large simulation tasks (those to the right side of the chart), the 440-MHz HP-UX machine turns in the best performance, followed closely by the 400-MHz HP-UX machine. The 440-MHz machine even wins on a couple of the smaller benchmarks, though it falls down on the short behavioral simulations. The results for both of the HP-UX machines include the effects of increasing the virtual memory page sizes via chatr.

When the 440-MHz PA-RISC beat the 550-MHz Pentium III, you can see the effects of both a superior 64-bit processor architecture and a fast system architecture. The PA-8500's large on-chip cache and fast external memory bus (120 MHz compared to 100 MHz for the Pentium) give the HP-UX machines an advantage--at least as long as Verilog-XL has large memory pages with which to play.

Without using chatr to boost the HP-UX machines' performance, IBM's 550-MHz NT machine would have won on every benchmark except the shortest one, the 800k RISC behavioral simulation. The 360-MHz Sun Ultra 60 turned in the best performance on this short benchmark. We don't fully understand how the 360-MHz UltraSPARC beat the rest of the field. However, at the very least, our work indicates that processor clock speed doesn't always relate to EDA performance in a linear way.

The chart also shows why we run so many different benchmarks. Different machines and especially machines with different architectures achieve a wide range of results depending on the benchmark. Understanding why performance differs among machines poses many interesting questions, but most important is the realization that you can gain a true picture of a machine's capabilities only by comparing performance across a range of benchmarks. We will return to the simulation results later when we take up the all-important question: When does it make sense to upgrade an EDA workstation?

The typical EDA challenge

Our basic assumption about EDA work is that you probably run jobs involving fewer than 256 Mbytes of data on your desk, while bigger jobs are relegated to a big server or compute farm. This generalization, like all generalizations, is often untrue, but it gives us a useful way of classifying performance.

We also grouped benchmarks that involve fewer than 256 Mbytes of data (see Figure 2). Some of the machines included in the chart have more DRAM than that, but the additional memory makes no difference to the benchmark results. Rather, performance depends on factors such as processor speed, memory I/O, disk I/O. As with the simulation results previously discussed, some of the machines vary a great deal in their relative performance from benchmark to benchmark.

Figure 3 further clarifies the results by averaging each machine's performance across all the moderately sized benchmarks. The half-a-point advantage demonstrated by the 500-MHz Compaq PC over the 550-MHz IBM machine means that the two systems finished essentially dead even. The ability of the Compaq system to keep up with the seemingly faster IBM system isn't surprising if you're familiar with Compaq's dual-bus system architecture. In contrast, we expected the IDE RAID controller in the IBM system to provide a smaller advantage than the SCSI RAID in the Compaq SP700, yet the IBM machine managed the fastest time on the 120-Mbyte disk dump in the Behavioral Memory benchmark.

The high-end winner

Moving on from the typical EDA tasks to larger ones, our conclusions depend on the results for our top three featured machines across benchmarks involving more than 256 Mbytes of data (see Figure 4). For these tasks, memory matters up to about a gigabyte and each of these machines has 2 Gbytes. Thus, as with the typical roundup, winning the high-end challenge depends on compute power and I/O.

Figure 7 - Synthesis performance summary by class
The results on the synthesis benchmarks show an even wider divergence between slow and fast machines than we saw on simulation. Upgrading from the 300-MHz NT system to the 440-MHz HP-UX machine could cut your synthesis run times in half.

The 440-MHz J5000 HP-UX machine dominated most of the high-end benchmarks but fell off the edge on the Talisman hod place-and-route task, which is by far the largest of the place-and-route benchmarks. As mentioned earlier, we can't help but wonder whether setting a different virtual memory page size via chatr would have given the J5000 a boost on Talisman hod as well as on the smaller place-and-route benchmarks.

Further evaluation of the high-end results reveals the 440-MHz HP J5000 as the clear overall winner by almost a 6 percent margin (see Figure 5). Bear in mind that this superiority is evident only if you restrict your view to the largest benchmarks, an appropriate view if you accept the idea that a machine such as the J5000 would be a server for large EDA jobs rather than a desktop machine running smaller tasks.

If you broaden your consideration to include all of the benchmarks, however, the J5000 no longer comes out the clear winner (more on the exact numbers later). The reason lies in the J5000's relatively slower performance on many of the smaller benchmarks, but especially on two of the place-and-route benchmarks. The high-end collection includes only the largest of the latter benchmarks, the Talisman hod.

To complicate the picture, though, the place-and-route run times can vary by as much as 10 percent because of the randomness built into the process. Further, unless you specialize in place-and-route work, you almost certainly perform that task much less often than simulation and synthesis. As for the latter task, Synopsys informed us that the first version of the Design Compiler HP-UX port that we used had some of the compiler optimizations turned off, so Synopsys supplied us with a patch that installed the optimizations. We reran the simulation benchmarks for the HP-UX machines and saw a significant increase in performance; the results included in this report come from the correctly optimized software. Given the HP-UX machines' mystifying performance shortfall on all but one of the place-and-route benchmarks, we can only speculate about whether the Snaketech software was fully optimized for HP-UX.

In the end, we believe that the J5000 well deserves the high-end EDA crown.

Figure 8 - Place-and-route performance summary by class
Place-and-route results show the fewest performance differences from one class of machine to another, possibly because none of the place-and-route benchmarks is very big. All the same, upgrading from the slowest to the fastest machine should improve your run times by a third.
When to upgrade?

We don't need to benchmark so many machines across so many types of EDA tasks just to find the best overall EDA machine. The results, for instance, may also offer ways to create a set of profiles that show the various classes of EDA performance. Using this data, you can find a class of machine that is similar to the machine you use and, therefore, see how your simulation performance would probably improve if you upgraded to one of the faster machines (see Figure 6). The figure averages the results for several machines across all the simulation benchmarks.

If you are using a 300-MHz NT machine, you may be able to run simulations about 40 percent faster by upgrading to a 550-MHz PC or the 440-MHz HP-UX system. If your 300-MHz system has 512-Mbytes of DRAM or less and you run bigger EDA tasks, you can undoubtedly benefit from adding more memory.

How much is a 40 percent improvement? For example, if you run small EDA tasks, shaving 4 minutes off of a 10-minute run is nice, but it won't change your life. On the other hand, cutting an 8-hour run to less than 5 hours could indeed change your life for the better.

Similarly, the averaged synthesis results reveal that upgrading from a 300-MHz PC to the 500- or 550-MHz PCs gives you about 40 percent better performance, while the 440-MHz HP-UX machine offers a full 50 percent gain (see Figure 7). For place-and-route tasks, by upgrading from 300 to 500 MHz (the fastest machine on the place-and-route benchmarks) you could expect about a 34 percent improvement (see Figure 8).

Finally, if you run all sorts of EDA tasks, you may gain almost 40 percent overall by going from 300 MHz to one of the three fastest machines (see Figure 9). Remember that the results include both large and small EDA tasks, leading to a three-way draw for the first-place ranking. Contrast that to the high-end winner for the exclusively large benchmarks, where the 440-MHz HP-UX machine proved fastest (see Figure 5).

Also recall that each class of performance in these comparisons is represented by a high-end machine, irrespective of the machine's actual speed. These highly tuned systems include RAID subsystems and other performance advantages. You can bet that they have their Ultra DMA turned on, too.

Figure 9 - Average EDA performance by class
Low-, medium-, and high-performance classes become evident when performance is averaged across all of our EDA benchmarks. Note that the average results of the four best machines are close enough to show them all equally deserving of the top performance ranking across both large and small benchmarks. This outcome contrasts with the clear advantage shown by the 440-MHz HP-UX machine on the group of large benchmarks shown in Figure 5.
How to upgrade?

If you remember our characterization of typical and high-end EDA work--typical tasks performed on the desktop and high-end tasks handled by a server or compute farm--you may be wondering why we're talking about trading in a 300-MHz machine for a 550-MHz machine. Depending on the type of EDA work you do, a 550-MHz desktop machine might make a lot of sense. Certainly the time savings of the faster, big-memory machines is almost irresistible.

It makes sense to upgrade, agrees Peter Denyer, who handles EDA technical issues for Sun, "But don't put it on the desktop!" Denyer recites good reasons for building a compute farm instead: a much higher machine utilization, a better use of EDA tool licenses because the farm's load manager tracks license use, and an easier backup of design data because you know exactly where the data is. Denyer also points out that in this environment, the performance of the workstation on your desk becomes largely irrelevant. All you need is a relatively modest desktop machine to provide a gateway to the compute farm. Denyer says that a compute farm containing five machines can provide a good start.

We have to agree with the logic of compute farms. Sun has thousands of Solaris machines in multiple compute farms for their internal use, and Compaq is running a substantial compute farm consisting of Pentium-based machines. The utilization of the compute power in these facilities is enormously higher than that of desktop systems. Designers take advantage of the available application "dial tone" to refine designs more than they otherwise would.

And yet, most designers continue to run EDA applications on their desktops. If you are among that group, you can use the benchmarks presented here to determine when you can justify upgrading that desktop machine. In the future, we plan to add cost information to these reports so that you will also have a cost/performance view for evaluating your choices. We expect to find that the highly attractive prices of today's powerful systems will simplify the upgrade question.


The authors wish to thank Alex Cellier for his assistance in conducting the benchmark tests.

James Lee is a principal consulting engineer at Seva/Intrinsix. He has 12 years' experience working with Verilog and was one of the first employees at Gateway Design Automation, which developed Verilog. Prior to joining Seva, he was with Cadence Design Systems. James is also a part-time instructor in Verilog at the University of California, Santa Cruz. The second edition of his book Verilog Quickstart has recently been published.

Bob Peterson is a free-lance writer in Monterey, California. Formerly the assistant managing editor of EDN magazine, he has written on a wide variety of technical topics for many publications and companies over the past 17 years.

To voice an opinion on this or any other article in Integrated System Design, please e-mail your comments to mikem@isdmag.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About