United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



special section

EDA Platform Benchmark: Simulation and Synthesis at the Same Time

Dual-process benchmarks justify Sun's preeminent position in the EDA world, but speedy PC hardware could make up for some of the deficiencies of Windows NT.

by James Lee and Bob Peterson



This installment of our EDA benchmark series examines the benefits and limitations of dual-processor systems. By running simulation and synthesis jobs independently and then simultaneously, we found that Windows NT takes little advantage of the dual-processor hardware offered by PC vendors in a field long ago mastered by Sun Microsystems.

Listing 1 Benchmark script for Solaris

#!/bin/csh
#
# runall shell script file for
# isd simulation and synthesis benchmarks
#
# Overall flow
# run the verilog tests with the timelogs going
# into timelog1p
# run the Synopsys tests with the timelogs going
# into timelog1p
# Note that the synopsys tests will delete the
# synopsys cache between runs
# run the verilog tests in the backround with the
# timelogs going into timelog2p
# run the Synopsys tests in the
backround with the
# timelogs going into timelog2p
#
#
mkdir timelog1p
mkdir timelog2p
set logroot=$cwd
#
#
cd verilog
csh runall.csh $logroot/timelog1p
cd ..
cd synopsys
csh runall.csh $logroot/timelog1p
cd ..
cd verilog
csh runall.csh $logroot/timelog2p &
cd ..
cd synopsys
csh runall.csh $logroot/timelog2p &
cd ..

As originally conceived, the series was going to show the entire ASIC design flow on Windows NT. In three installments during the year, we planned to cover the main tools in that flow: simulation, synthesis, and layout. By comparing the performance of PCs running NT with a Sun workstation running Solaris, we expected to test the EDA fitness of both the PC hardware and the NT operating system.

When we planned the series in November 1997, we were optimistic that the appropriate tools would soon be available for NT, and we had reasons for that optimism. Cadence had released a beta version of Verilog-XL for NT, and Synopsys was preparing Design Compiler for that operating system, so we saw the first two major pieces of the ASIC flow falling into place on the PC. Verilog-XL was indeed ready for our benchmarks, and in the nick of time Synopsys came through with a beta version of Design Compiler for NT that turned out to be impressively solid.

That brought us to August of this year, when we were still looking for place-and-route tools--any at all--that ran on both Solaris and NT. We asked the big EDA vendors, then we asked the small ones, but nobody was ready. Our fallback position was to at least test a floorplanning tool, but we had no luck there either. Our conclusion: The PC isn't yet ready for the complete ASIC design flow.

We'll keep asking for place-and-route tools. When they arrive, we'll devote another benchmark installment to that crucial task. In the meantime, we decided to take this opportunity to deal with issues that came up in the course of running the simulation and synthesis benchmarks. We also wanted to test the latest PC hardware, which offers faster processors and the possibility of more DRAM.

Moreover, for the first time in our benchmark efforts, Sun agreed to offer active support, so we didn't have to beg a Sun workstation from someone else. The following benchmark results thus feature the latest 360-MHz Sun workstation, as well as the best PCs available today. We welcome Sun's participation.

Let the bashing begin
We began by putting both Verilog-XL and Design Compiler together on one platform for a simultaneous bash fest. Unix's ability to run multiple tasks is unquestioned, and we were curious to see how well NT supported two big simultaneous EDA tasks. Even with dual processors to carry the load, would the PCs perform poorly in comparison with the Sun workstation?

It turned out that the two tasks didn't bash each other to death, unless you count memory thrashing on the benchmarks that challenged the PC's memory capacity. As in our previous benchmark findings, we saw dramatic performance drops arising from memory limitations. Given their ability to include far more DRAM than the PC, Sun workstations have always offered an advantage in that regard. We were hoping, then, to test the Sun workstation with its full complement of 2 Gbytes of DRAM, but a problem with some of the memory from a leasing company limited us to 1.5 Gbytes.

Note that we made no effort to keep the resources equal on the various platforms we benchmarked. We simply asked the hardware vendors to give us their best machines--although our earlier experience with memory-bound benchmarks led us to request large complements of DRAM. Specifically, we suggested 1 Gbyte of DRAM on the PCs.

Don't forget the memory
After comparing the 1.5-Gbyte Sun workstation and PCs containing 512 Mbytes, 1 Gbyte, and 1.5 Gbytes, we can only admonish PC vendors to put more memory slots on their motherboards. We understand that PC100 SDRAM has extremely demanding timing requirements that make it impossible to line up a large number of memory slots across the motherboard. But the PC can address more DRAM, DRAM is relatively cheap, and EDA users benefit enormously from every additional megabyte. At this point, though, we should say that users benefit from every additional gigabyte; if you're running big simulation or synthesis jobs, or both--and you hope to run big placement and routing jobs someday soon--you're talking gigabytes.

Even if you have the slots, can you get the gigabytes? One of our vendors offered us the latest 450-MHz Pentium II Xeon processor-based system with no problem, but they let us have only 1 Gbyte's worth of 256-Mbyte PC100 DIMMs for a few days. DIMMs generally seem to be in short supply. As our results indicate, though, the big DIMMs are worth the wait.

In that regard, we have to hand it to Compaq for simplifying the PC memory challenge. The Compaq Professional Workstation SP700 has eight memory slots on its motherboard, compared to four for the typical high-end PCs we've seen. The extra slots allow you to use more memory or use lower-priced, lower-capacity DIMMs.

The machines
These benchmark tests used four systems:

  • Compaq Professional Workstation SP700--two 400-MHz Pentium II Xeon processors; 1-Mbyte L2 cache per processor; 1.5-Gbyte PC100 SDRAM; dual 100-MHz memory buses; dual PCI buses; Mylex DAC960 disk array controller with 64-Mbyte cache; three 4-Gbyte 10,000-RPM disk drives configured as RAID 0; Windows NT 4.0, build 1381, Service Pack 3
  • IBM Intellistation Z Pro--two 450-MHz Pentium II Xeon processors; 1-Mbyte L2 cache per processor; Adaptec Array1000CA disk controller; 1-Gbyte PC100 SDRAM; three 9.1-Gbyte 10,000-RPM disk drives configured as RAID 0; Windows NT 4.0, build 1381, Service Pack 3
  • IBM Intellistation M Pro--two 400-MHz Pentium II processors; 512-kbyte L2 cache per processor; Adaptec Array1000CA disk controller; 512-Mbyte PC100 SDRAM; two 10,000-RPM disk drives configured as RAID 0; Windows NT 4.0, build 1381, Service Pack 3
  • Sun Ultra 60--two 360-MHz UltraSPARC II processors; 4-Mbyte L2 (secondary) cache per processor; 1.5-Gbyte DRAM; one 9-Gbyte 7,200-RPM disk drive; Solaris 2.5.1

Listing 2 Benchmark script for NT

@ REM 
@ REM runall shell script file for
@ REM isd simulation and synthesis benchmarks
@ REM 
@ REM Overall flow
@ REM run the verilog tests with the timelogs
@ REM going into timelog1p
@ REM run the Synopsys tests with the timelogs
@
REM going into timelog1p
@ REM Note that the synopsys tests will
@ REM delete the synopsys cache between runs
@ REM run the verilog tests in the backround
@ REM with the timelogs going into timelog2p
@ REM run the Synopsys tests in the backround
@ REM with the timelogs going into timelog2p
@ REM 
@ REM 
mkdir timelog1p
mkdir timelog2p
@ REM 
@ REM 

set cwd=%#p0 set path=%path%;%cwd%\bin

cd verilog

call runall.bat %cwd%timelog1p cd .. cd synopsys call runall.bat %cwd%timelog1p cd ..

cd verilog start cmd /c runall.bat %cwd%timelog2p cd .. cd synopsys start cmd /c runall.bat %cwd%timelog2p cd ..

Note that the Compaq Professional and IBM Intellistation Z Pro both incorporate Intel's Xeon processor, which employs the 1-Mbyte L2 cache used by the new Compaq and IBM systems. Additionally, the Xeon L2 cache bus operates at the same speed as the processor core.

Note too that the IBM Intellistation M Pro is a holdover from our previous benchmark installment. We used it to provide a baseline for a non-Xeon system with a smaller main memory and L2 cache.

Figure 1 Dual- and single-process performance

The performance for running each benchmark with other benchmarks and by itself is compared here by dividing the average dual-process run times by the average single-process run times. The large divergences for the 512-Mbyte PC on the biggest benchmarks indicate that the machine was severely hampered by its small memory.

Sun's participation in the tests brings us the benefits of the latest Sun workstation for benchmarking as well as an open channel to the company's viewpoint on EDA platforms. As indicated by the letters to the editor printed in this magazine over the past few months, we've received a lot of viewpoints on EDA platforms from a lot of sources.

In the previous benchmark reports, we've sometimes waxed ecstatic over the performance of the PCs. The reason was simple: We were impressed. Relatively inexpensive PCs performed superbly, and the PC industry deserves a lot of credit for achieving their performance. Does that mean that you should pluck your tried-and-true Unix workstations off of your network and replace them with PCs? Of course not. Should you expect to do so over the next 10 years? Will Linux make the PC the preeminent EDA machine? Will NT evolve into a super EDA OS? We don't know.

NT: Some practical considerations
Today, for many reasons that designers already know, it's difficult to conceive of serious design work going on without Unix workstations. So if we sing the praises of the PC and even observe at times that NT performed well despite the worst misconceptions of its creators, please bear in mind as an ever-present caveat that EDA work encompasses far more than running a 1.3-million-gate simulation impressively fast. Designers must consider the availability of all the tools they want to use, for instance, and the ease of running those tools across a network. And when they consider cost, just as when they buy a car, they have to compare cars and look at the cost of driving themselves where they want to go, not just once, but over time.

Before we continue with information about the current round of benchmarks, we need to clear up a few points about our previous benchmark installments. For example, a Sun spokesman observed that our benchmarks are almost entirely based on integer computations. Logic simulation is a purely integer-based task, and synthesis uses only a small percentage of floating-point operations. When we eventually benchmark place-and-route tasks, we'll be working with more integer calculations.

On integer-centric tasks, performance scales with processor clock speed, giving the 400- and 450-MHz PCs a distinct advantage over 300- and 360-MHz Ultra 60s. Since Sun contends that floating-point performance is higher on the Ultra 60, our benchmarks fail to illuminate that advantage. As more designers perform deep-submicron design tasks such as LRC extraction, Sun points out that floating-point performance will become more important and, it says, the Ultra 60 will strut its stuff.

Sun has also pointed out two problems with the pricing comparisons we made in previous benchmark installments. First, it complained that the price range we attributed to the Ultra 60 workstation used in our two most recent benchmark installments was off by about a factor of 2 (which still left the Ultra 60 at a much higher price than the 400-MHz PCs we were testing at the time). When we mentioned that we had the invoice showing the system's exact purchase price, they told us that we must have bought the system on the first day it became available. By the time the benchmark article actually appeared in print, the price was much lower. The steep decline of the price curve is a hazard of any new product price comparison and affects the just-introduced PCs as well as the Sun workstation, but the timing was most unfavorable for Sun.

Comparing cars with buses
Sun's second challenge to our price comparisons raises a more interesting point. It points out that comparing the Ultra 60 with a PC is like comparing a bus with a car. It's true that the Ultra 60's features and cost reflect a more extensive range of capabilities than a desktop PC would typically offer, but the Ultra 60 was the most appropriate Sun workstation available to us, and we felt fortunate to be able to work with a new machine that offered the best performance we were likely to obtain at the time.

The new Ultra 60 used in the benchmark tests of this installment offers a somewhat more reasonable parallel to high-end PCs, with its 360-MHz processors improving on the 300-MHz processors in the previous model. The price is comparable, too, given the expense of the higher complement of memory. Still, Sun points out that the typical ASIC designer would be more likely to get an Ultra 10, whose 1-Gbyte DRAM and 330-MHz UltraSPARC IIi processor achieve 85 to 90 percent of the Ultra 60's performance on integer-oriented tasks, it says. The Ultra 10's cost is half that of the Ultra 60, and in contrast to the high-end PCs, you get all those transcendent workstation benefits as a bonus.

Among the benefits is easy system manageability, specifically the ability to manage the machine across a network. NT has a gaping hole in that part of its services portfolio. Sun justifiably contends that any cost comparison between PCs and Sun workstations should consider cost of ownership as well as purchase price. Given Unix's manageability features, Sun has a good story to tell about minimizing cost of ownership.

Figure 2 Worst-case dual- and average single-process performance

A benchmark can run much slower when competing with other tasks, as shown using the same test arrangements and calculations described for Figure 1 but employing worst-case values for the dual-process runs.

An aspect of manageability that shouldn't be overlooked is Unix's support for telecommuting. When you work on your PC at home, you can easily work on a Word or Excel document, but do you have enough horsepower at home to run significant EDA tasks? Even if you do, you have to consider the cost of the EDA software, licenses, and large databases needed on the corporate network. Inevitably, you need a way to run EDA jobs remotely on the corporate network from home. You can do that easily from a Unix--but not an NT--system.

The PC makers are working hard to fill in their story on the manageability and interoperability side, and they've made good progress. At the same time, as author James Lee has pointed out ("Linux Has What It Takes for EDA," September, p. 60), getting the manageability pieces to fall into place takes a lot of work for PC users. Whereas Sun workstation users generally install four pieces of software (Solaris plus a C compiler, Perl, and Tcl/Tk), to obtain similar functionality PC users have to install nine pieces of software along with Windows NT. (Intriguingly, Linux requires installation of only one software package: Linux itself.) Clearly, PCs are not equivalent to Sun workstations.

Figure 3 1.3M RISC gate-level simulation

Using only the biggest simulation benchmark, the 1.3-million gate RISC design at the gate level, the baseline PC runs much slower on average when competing with another task, but the other machines turn in comparable performance in either case.

We do wish to pose some rhetorical questions, however: How much of the PC-versus-workstation debate really concerns Unix versus NT? If the major EDA vendors offered their tools for the x86 port of Solaris or Linux, would we suddenly find ourselves with nothing to discuss? There was a time when EDA vendors such as Daisy Systems built their own hardware because general-purpose computers were either too big or too puny for EDA tasks. Companies like Sun changed all that with cost-effective computers that were just right. Will PCs someday change all that again?

Figure 4 Decompress synthesis

The largest synthesis benchmark, decompress, gives similar results to those in Figure 3. In this case, however, the Compaq machine actually runs the synthesis faster on average when competing with other benchmarks than when running alone.

As Microsoft has demonstrated to the devastation of so many competitors, the company is dogged in developing its software to meet the market's requirements. NT already meets many EDA requirements. In our benchmarking work, we've seen some NT failures, but we've also seen evidence that NT performs better than many EDA users expect.

NT: The good news
A common idea among Unix fans is that application performance on NT gradually degrades over long periods of time. To test this idea, James ran an informal long-term Verilog-XL exercise on the 400-MHz IBM and Compaq systems from our second simulation benchmark installment ("EDA Platform Benchmark: Simulation II," September, p. 44).

The test relied exclusively on the 1.3-million-gate (1.3M) RISC gate-level benchmark used from the beginning of our simulation benchmark installments. The design consists of 256 iterations of a simple RISC processor running a short program, which takes approximately four hours on the test machines (for the details, see www.isdmag.com/edabenchmark).

For the long-term test, James modified the test bench so that after a fixed amount of simulation, it pauses, prints the system time, and then begins again. Essentially, the test bench hits the reset button so that Verilog-XL continues to work without pause. This cycle continued for 400 automatic repetitions over the course of a week, and then James restarted the entire sequence.

After the first week, as test karma would have it, the Verilog-XL license expired. Without rebooting the PCs, James fixed the license and restarted the simulation. The simulation went on for three more weeks, for a total run time of four weeks without rebooting. Although the PCs weren't being used for any other work, James did peruse the time logs periodically, so the systems didn't remain completely undisturbed.

Even though the run times differ by a few minutes every 9 or 10 runs, we found no performance degradation over time. The primary goal of the test was to make a simple check for degradation on NT, so we haven't analyzed the results to find out why the periodic time variations occur. Future benchmark installments will cover the test more thoroughly. We also hope to perform more extensive longevity tests involving multiple concurrent tasks. In the initial test, however, NT performed perfectly.

We also like the ease of installing software on an NT machine using an installation wizard. In a few minutes, for example, the wizard that came with Verilog-XL gave us a default installation that worked fine. On Solaris, we had to edit files to set up the user environment and other functions (a process that an installation kit and a full connection to our LAN would have simplified somewhat). However, after software installation the NT machine requires rebooting, and the Sun workstation doesn't. Even so, installing NT on only a few machines is easier.

NT: The bad news
Before NT enthusiasts become carried away with NT's accomplishments, we must report some problems that we had with NT on the current benchmarks. For the benchmarks, we originally received PCs from three vendors who had pre-installed NT at the factory. The first thing we noticed after booting every one of them was a message saying that we should check the event log for an error message. The apparent need to have a message proclaiming the existence of another message baffles us. Why not just tell us the bad news immediately and be done with it?

When we found the actual error message, it said: "The Server service terminated with the following error: Not enough server storage is available to process this command." We have no idea what command that message refers to, but we received a similar message on one of our previous NT platforms, and reinstalling Service Pack 3 magically made the message go away. This time, therefore, we just ignored it.

A bigger problem occurred when our system administrator attempted to change one PC's default DHCP LAN setup to the static IP arrangement we use for all our test machines. The system administrator has made the change many times, but this time he must have done something horribly wrong because the system crashed and couldn't be revived.

Figure 5 Single-process ping-pong

When a single task--here, the 1.3M RISC gate-level simulation--runs on a dual-processor NT system, NT creates an unnecessary overhead load by constantly shifting the task from one processor to the other, as the screen capture of the NT task manager shows. Ping-ponging, evidently, defeats the benefit of the larger cache on the Xeon processor.

When we tried to reinstall NT, the system would access the installation CD-ROM, say that it was checking the system's hardware configuration, then crash again. Later we found a mismatch in the driver for the disk controller. When a disk controller that matched the driver was swapped in, the system began working again, but only at half speed. We decided that including that system in the benchmark tests made no sense. We chalk the problem up to NT because the system administrator should never have been able to obliterate the operating system with a simple change to the network setup.

Dual-processor challenges
The Sun workstations and most of the high-end PCs we've benchmarked have contained dual processors, yet none of the benchmarks have examined the usefulness (or, it turns out, the handicap) of the second processor. Neither Verilog-XL nor Design Compiler is configured to take specific advantage of a second processor, so the only way to determine what benefits that processor provides is to benchmark the concurrent execution of two different tasks.

Conceptually, running such a benchmark is simple: start two tasks simultaneously and measure how long each of them takes to finish. Practically, though, the test is fraught with inconsistencies. It's difficult to know how much load each task is placing on the system's resources. When one of the tasks finishes before the other, the test is no longer measuring dual-processor performance at all. One way to minimize the latter problem is to use a long-running task as a relatively constant load on one processor and then measure the length of time required for the second processor to complete other tasks.

Working with those ideas, the benchmarks presented in this installment use two methods to explore dual-processor performance. In the first round of tests, we ran our existing short and long simulation and synthesis benchmarks first separately and then at the same time and measured how long each benchmark took. In the second round of tests, we played one long-running synthesis task against a long and repetitive simulation.

Similar scripts
Starting simultaneous processes on the Sun was easy using the script shown in Listing 1, especially given Solaris's ampersand option, which indicates a background command for running simultaneous tasks. But after hearing a Microsoft representative say that "scripting is almost a failing of Unix, not a virtue" (see the transcript of the Linux vs. NT Shootout forum at www.isdmag.com/linuxvsnt.html), we began to worry about NT's scripting capabilities. Specifically, we wondered whether NT's DOS-flavored scripting language provided a way to deal with simultaneous processes. As a result, we were delighted to find the NT Start command, which enabled us to use a simple script (see Listing 2) almost identical to the one used on Solaris. Note the use of Call and Start for different purposes.

When we ran the first round of dual-processor tests, we didn't include the entire suite of simulation and synthesis we used before (see "EDA Platform Benchmark: Simulation," March, p. 62, and "EDA Platform Benchmark: Synthesis," July, p. 56; also, note that the rpu256 design used in the original and current synthesis benchmarks is in fact the 1.3M RISC design used for the simulation benchmarks). To simplify the tests and make them more accurate, we deleted tasks that took less than 20 minutes to run. Short tasks raise accuracy questions because of the resolution of the timing
measurements.

Figure 6 Talisman synthesis

Running the long Talisman synthesis task both in competition with a long simulation task (the 1.3M RISC gate-level benchmark) and alone provides a more accurate picture than the dual-task tests involving all of the synthesis and simulation benchmarks. On the 512-Mbyte machine, the insufficient memory greatly degraded the dual-process performance.

We also decided that the short tasks no longer serve a useful purpose. We originally included them because we weren't sure how well NT platforms would handle big EDA tasks. Since NT turned out to handle huge tasks rather well, the short tasks are extraneous.

A new benchmark
Finally, we added a large synthesis benchmark: decompress, a behavioral image decompression engine that uses Synopsys's Behavioral Compiler.

Counterintuitively, dual-processor NT systems sometimes run a task faster if you run a second task at the same time. The competition for system resources sometimes seems to bring out the best in NT--even when the system lacks sufficient DRAM to keep the two concurrent tasks in memory.

For the simulation tests, we used Verilog-XL version 2.6.24 on the Sun workstation and 2.7_beta.2 on the PC. The Design Compiler version we used on both the Sun and the PC was 98.08. A footnote on the benchmark results is that Synopsys informed us that the preproduction version of Design Compiler we used on the PC might not provide accurate cross-platform run-time comparisons on certain test cases. Our previous beta version furnished comparable run times (so the previous synthesis benchmarks are valid), but we can't draw parallels between the PC and Sun synthesis times in the current tests.

We can, however, evaluate the difference between single- and dual-processor performance on each platform. We ran each benchmark by itself on each platform to establish a baseline, then ran the simulation and synthesis benchmarks simultaneously and logged the run times. Because of variability in the starting and finishing times for the various tasks, we analyzed the results in terms of average and worst-case run times.

The initial dual-processor results
Figure 1 shows a plot of the average dual-process run times divided by the average single-process run times for each platform. The higher the value, the slower the dual-process time was relative to the single-process time. The large divergences for the 512-Mbyte PC occur on the biggest benchmarks, indicating that the machine was severely hampered by its small memory. The dual-process test therefore amplified the memory limitations we saw on our previous simulation and synthesis benchmarks.

Figure 2 shows the worst-case dual-process run times divided by the average single-process run times for each platform. The chart offers insight into the question of how much you suffer if you run two tasks at the same time. The biggest benchmarks can cause great suffering if the machine runs out of memory.

To focus further on the memory challenge, Figures 3 and 4 show the ratios of dual-process/single-process run times for the two biggest benchmarks, the 1.3M RISC gate-level simulation, and the decompress synthesis. For the former, the Compaq and Sun machines turned in the same performance in either case, and the IBM PC was only slightly slower running both suites. A curious circumstance emerges from Figure 4. On average, the Compaq Professional Workstation SP700 actually runs the decompression benchmark faster when another task (simulation) is running than when the decompression synthesis is running by itself. This machine seems to thrive on competition.

We believe that the reason for the odd result is that when a task is running alone on an NT system containing two processors, the task ping-pongs from one to the other. Using the NT task manager to observe a single task running (see Figure 5) shows the complementary load swings on each processor.

The operating system apparently sees the unused capacity on the second processor and shifts the job over, only to notice a tantalizing amount of unused capacity on the first processor. Unable to resist this unused capacity, NT shifts the job back to the first processor, and the idiotic cycle begins again. (However, memory is a variable in the equation, and the memory limitations overrode the advantage gained by obviating ping-pong effect for the 1.3M RISC gate-level simulation.)

If you're using an NT machine, you might consider running some stray application along with your EDA job to goad the latter into running faster. If the PC is for personal use, the second processor is mostly useful for running Office applications anyway. So run an EDA job on one processor, do a small chore such as word processing on the second processor to avoid putting too much demand on memory resources, and you'll finish everything faster.

Pushing the two-processor load
In the second round of this series of tests, we ran a large synthesis job at the same time as the largest simulation run. This approach provides a better basis for comparing dual- and single-process performance by eliminating the main variables inherent in the dual-task tests involving all of the synthesis and simulation benchmarks--the CPU memory and I/O load of the other task. The synthesis consisted of a portion of the Talisman graphics accelerator provided to us in Verilog code by Microsoft. Synthesizing the code three times with run times on the order of three hours took more than a day on our test systems, so the job provided a continuous workload that logged three intermediate run times. The simulation job consisted of the 1.3M RISC gate-level benchmark repeated over and over as described earlier in the NT longevity test. The simulation run times are on the order of four hours.

Figure 7 1.3M RISC simulation

Shown here is the ratio of the dual- and single-process performance for the 1.3M gate-level simulation running with the Talisman synthesis and alone. Surprisingly, the 512-Mbyte PC ran the simulation faster with competition than alone, a result of eliminating the ping-pong effect.

The test comprised four basic steps:

  1. We started the Talisman synthesis and 1.3M RISC simulation together.
  2. We repeated the Talisman synthesis three times while the simulation repeated. We had each task log its run times on each repetition, obtaining times for the tasks when they're running together.
  3. The simulation continued to repeat after Talisman finished. When the simulation run times reached a consistent, minimum value, we used this time as a single-process baseline for the 1.3M RISC simulation.
  4. Finally, we ran the Talisman synthesis alone to set a baseline for the task.

As explained earlier, we can't compare the absolute synthesis times across the Sun and PC platforms, but that was unimportant in this case. We were able to compare run times for synthesis alone against the run times for synthesis with constant competition. In Figure 6, we again used the ratio of the two times to make the comparison.

Because the synthesis job needs 700 Mbytes of virtual memory and the simulation job needs 800 Mbytes, the two jobs together crunched the 512-Mbyte PC and again wiped out any benefit that accrued from eliminating the ping-pong effect. The other systems handled the simultaneous loads with only minor degradation, as they did on the 1.3M RISC simulation shown in Figure 3.

Figure 7 similarly compares the 1.3M RISC simulation alone against the simulation with competition. As in the synthesis set, the tests produce an odd result: the 512-Mbyte NT machine takes less time to run the simulation when loaded with the synthesis task. We see the anomaly as further evidence of NT's poor ability to manage the dual processors. The Sun workstation, on the other hand, turns in fairly consistent performance running one task or two--probably one of the benefits of the upgrade from SunOS to Solaris.

An inspiring trend
Lastly, we want to offer a collection of results accumulated across many platforms since the beginning of the benchmark series. The primary goal was to determine whether PCs (specifically NT-based PCs) can run serious EDA jobs. When we began, we were concerned that we might overwhelm the PC with a million-gate simulation. Since then, the 1.3-million-gate simulation run has become a staple of the tests.

Figure 8 Performance improvements

Over the course of a year's worth of benchmark tests, we've seen run times for the 1.3M RISC gate-level simulation improve dramatically. Faster processors and more memory account for the 154 percent improvement.

Figure 8 shows the run times for the 1.3M RISC gate-level simulation on eight different systems. The transition from the slowest time of 8,184 seconds to the latest time of 3,222 seconds represents an improvement of 154 percent in the space of less than a year. The PC has certainly proven its ability to run serious EDA jobs.

From the perspective of hundreds of benchmarking hours with many different tasks and conditions, we offer a few conclusions about NT PCs and EDA.

On the hardware side:

On the EDA tool side:

On the operating system side:

If Microsoft can make NT more robust, improve its multitasking capabilities, simplify manageability, and make remote operation transparent, the PC will be ready for the EDA big time. Until then, the excellent PC hardware will occupy only a niche (albeit an important and rapidly expanding niche), with Sun ascendant.


The authors wish to thank Ramprasad Rangarajan for his expert assistance in conducting the benchmark tests. As project manager at Seva Technologies, Ram's many years of experience with Synopsys Design Compiler helped make these tests possible.

Contributing editor James Lee is a senior consulting engineer at Seva Technologies, Inc. in Fremont, Calif. He has 12 years' experience working with Verilog and was one of the first employees at Gateway Design Automation, which developed Verilog. Prior to joining Seva, he was with Cadence Design Systems. He's the author of Verilog Quickstart and is also a part-time instructor in Verilog at the University of California at Santa Cruz.

Bob Peterson is a freelance writer based in Monterey, Calif. Formerly the assistant managing editor of EDN , he has written on a wide variety of technical topics for many publications and companies for the past 16 years.

To voice an opinion on this or any Integrated System Design article, please email your message to miker@isdmag.com.


integrated system design  November 1998



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com email webmaster@isdmag.com
For advertising information email amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About