United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Special Section

EDA Platform Benchmark: Simulation

Pentium II-based workstations running Windows NT offer a powerful platform for today's demanding EDA applications.

by James Lee and John Miklosz



The value of the PC as an EDA platform has been debated since the first PCs hit the street and the first simple schematic capture package running under DOS made its debut. That was some 20 years ago, give or take a couple of years, and we've come a long way since then. Today, Pentium II and Pentium Pro platforms are available, offering a choice of one to four processors running Windows NT at 300 MHz, with a half gigabyte of RAM, Wide Ultra SCSI hard disk drives that provide several gigabytes of storage (and if one drive isn't enough, you can use RAID), and your choice of fast graphics cards, with Accelerated Graphics Port (AGP) quickly becoming available.

Table 1 Hardware configurations
Workstation Specifications
Compaq 5100 300-MHz single-Pentium II, 512-kbyte L2 cache, 512-Mbyte EDO RAM, dual Reliance Computer memory controllers, Wide Ultra SCSI hard drive
Dell Workstation 400 300-MHz single-Pentium II, 512-kbyte L2 cache, 512-Mbyte EDO RAM, Intel 440FX PCIset (memory controller), Wide Ultra SCSI hard drive
Gateway E5000 300-MHz single-Pentium II, 512-kbyte L2 cache, 512-Mbyte SDRAM, Intel 440LX AGPset (memory controller), Wide Ultra SCSI hard drive
Hewlett-Packard Kayak 300-MHz dual-Pentium II, 512-kbyte L2 cache, 512-Mbyte SDRAM, Intel 440LX AGPset (memory controller), Wide Ultra SCSI hard drive
IBM Intellistation 300-MHz single-Pentium II, 512-kbyte L2 cache, 512-Mbyte SDRAM, Intel 440LX AGPset (memory controller), EIDE hard drive, two 4.5-Gbyte Wide Ultra SCSI hard drives configured as RAID
Sun Ultra 2 300-MHz UltraSPARC II, 2-Mbyte cache, 512-Mbyte RAM, Wide Ultra SCSI hard drive

Nevertheless, the debate goes on and Unix remains the undisputed leader in the EDA industry. According to Collett International, Inc. in Santa Clara, Calif., a well-known consulting firm in the EDA market, over 65 percent of all the design teams in North America are currently using Unix. According to data from the EDA Consortium (San Jose), Unix-based tools garnered more than 90 percent of new license revenues in the third quarter of 1997, compared with only 10 percent for all DOS, Windows 3.x, Windows 95, and Windows NT tools. On the hardware side, Sun Microsystems, Inc. (Palo Alto, Calif.) enjoys the premier position, accounting for more than 40 percent of the workstations shipped in 1996, according to International Data Corp. in Framingham, Mass.

A world of opportunity

Despite Unix's entrenched advantage and Sun's strong market position, there are several compelling reasons to consider Intel/Windows NT-based platforms. The Pentium II and Pentium Pro processors allow scalable systems and offer performance that matches, and in some cases exceeds, the performance of SPARC processors. Furthermore, Windows NT is a robust, scalable operating system, and EDA tool vendors have been aggressively porting their tools to it. With Windows NT, the world of PC software is open to a designer, and it's no longer necessary to switch between a technical workstation for EDA tasks and a PC for ordinary office tasks such as e-mail, spreadsheets, and word processing. Windows NT-based machines are considerably less expensive than Unix-based machines: A study of business users by Deloitte & Touche, for example, reports cost savings of nearly 40 percent for workstations running Windows NT compared with those running Unix. Furthermore, future platforms will be based on the IA-64 architecture and the 64-bit version of Windows NT, which should provide an enormous jump in performance.

Because of those advantages, the growth in Windows 3.x, Windows 95, and Windows NT tools has been dramatic--roughly 160 percent between 1996 and 1997, according to data from the EDA Consortium. (Although the percentage growth is dramatic, the numbers are still small, with new license revenues of only $39 million in the third quarter of 1997.) But the pace will continue and even accelerate, and Collett International predicts that Windows NT-based EDA software revenues will be equivalent to that of Unix-based EDA software in 2001.

The migration from Unix-based platforms to Intel/Windows NT-based platforms isn't without its challenges, however. They involve ease of use, data and application migration, interoperability, staffing and training, networking, platform stability, and a functionality that lags behind Unix. But over and above those challenges, the fundamental issue of platform performance still needs to be resolved. Quite simply: Do Pentium-based workstations measure up to the challenges of EDA?

Setting up the benchmark

In a world void of budget constraints, we would have acquired workstations for benchmarking the same way you would acquire workstations for your design team--we would have bought or leased them. Since buying a half-dozen Pentium II workstations was never really an option, and leasing the specific workstations we wanted to benchmark proved too difficult, we asked the vendors participating in the benchmark to supply us with the units we needed. Those vendors were Compaq, Dell, Gateway, Hewlett-Packard, and IBM.

We specified these units at 300 MHz, with a single Pentium II processor, 512 kbytes of level-two cache, and 512 Mbytes of RAM. The memory type--EDO or synchronous DRAM--was not specified. Although we didn't specify the type of disk drive (EIDE or SCSI), the graphics controller, the monitor, or the network interface card, we did point out that these machines were going to be benchmarked as EDA workstations and left it up to the individual vendors to outfit them as they deemed appropriate for this type of application (see Table 1).

Since our objective was to benchmark the machines only to determine their computational performance, we didn't concern ourselves with the graphics or peripheral performance. As it turned out, however, we should have paid more attention to I/O and disk drive performance because they had an impact on overall performance in a couple of our benchmark cases.

Obtaining the machines from the vendors has its disadvantages, because all of the machines might not have the same "backgrounds." In other words, a machine we received could have been factory-fresh and right off the production line, or it could have a questionable ancestry because it was used as a demo machine and bounced around from site to site. Or it could have been carefully tweaked by the vendor because it was going to be used in a benchmark. Whereas the first two scenarios are realistic and would duplicate your experience, depending on whether you bought or leased a brand new workstation or leased one that's used, the later scenario is anything but representative of the real world. As far as we know, none of the workstations used in this benchmark received any special attention. We do know that at least one of them was used as a demo model.

Table 2 Model size
Simulation Approximate gates Data structure size (bytes)
Memory -- 38,185,764
5k RISC 5k
Gate-level 4,717,992
Behavioral 626,596
40k RISC 40k
Gate-level 27,258,552
Behavioral 2,570,616
800k RISC 800k
Gate-level 523,832,052
Behavioral 49,198,916
1.3M RISC 1.3M
Gate-level 841,291,104
Behavioral 82,005,736
Life48 354k (225k)*
Gate-level 76,857,720
Behavioral 22,048,984
Life128 976k (600k)*
Gate-level 203,740,452
Behavioral 61,566,480
* Lee's estimate of the total gates

Five Pentium II machines running Windows NT were benchmarked at Seva Technologies, Inc. in Fremont, Calif.: the Compaq 5100, Dell Workstation 400, Gateway E5000, Hewlett-Packard Kayak, and IBM Intellistation. We were able to account for the differences we observed among the first four machines, and on the whole, those four vendors agreed with our analysis.

Unfortunately, Sun would not provide us with a 300-MHz Ultra 2 (based on the UltraSPARC II processor) to use in the benchmarks, and it was literally impossible to lease one because of the high demand for those workstations. Our solution was to use a machine that Compaq itself had used in a series of benchmarks.

The benchmarks

There are a couple of criteria that need to be applied when selecting benchmark suites. The first is that the benchmark suite should, ideally, exhaustively exercise the hardware you're benchmarking. Because you can configure a PC using a virtually unlimited array of hardware in the form of disk drives, graphics cards, and network interface cards, our benchmark was restricted to exercising the basic Pentium II and memory architecture of the machines using benchmark suites that were computation- and memory-intensive. The most significant differences we observed between the benchmark suites, in fact, occurred when the benchmark involved significant writes to disk (which we hadn't taken into account).

Figure 1 Simulation of RISC models

Although performance differences of the various Pentium workstations may not be significant for small designs and short simulation times, they are when the simulation--behavioral (a) or gate-level (b)--exceeds 10 minutes, and even more so when it takes more than an hour.

The second criterion is that the benchmarks should be readily available and, ideally, in the public domain. Unfortunately, good EDA benchmark suites or circuits are hard to come by, so we used a mix of proprietary, vendor, and publicly available benchmarks.

The proprietary benchmark was a behavioral memory model developed by Seva for a 128-Mbit virtual channel SDRAM. (For details about the data size of this model and the other models used, see Table 2). The benchmark actually involved some significant disk writes (about 120 Mbytes) and was one that clearly revealed some differences among the disk drives used in the benchmarked platforms.

The vendor benchmark was a basic RISC processor used for demonstration purposes and developed by author James Lee several years ago while he was at San Jose-based Cadence Design Systems. The design comprises about 5,000 gates, and the benchmark consisted of simulating the design at the gate and behavioral levels and as a mixed gate- and behavioral-level simulation, in which the ALU was simulated at the behavioral level (the ALU represented about 1,600 gates). This design was then replicated eight times (to yield approximately 40,000 gates), 160 times (to yield approximately 800,000 gates), and 256 times (to yield approximately 1.3 million gates); and the same gate-level, behavioral, and mixed simulations were run. (For convenience, the four benchmarks are referred to as 5k, 40k, 800k, and 1.3M RISC.) You can obtain the Verilog code for this design directly from Integrated System Design's Web site ( www.isdmag.com/edabenchmark ).

The public-domain benchmark we used is called Life and was developed at MIT as part of its Reconfigurable Architecture Workstation (RAW) project. The RAW benchmark suite was designed to facilitate the comparison, validation, and improvement of reconfigurable computing systems. With that objective in mind, the benchmarks were designed to be portable to any reconfigurable computer as a behavioral Verilog netlist, small and easy to understand, and parameterizable to generate designs that would consume a range of hardware resources. Each benchmark was designed in both C and behavioral Verilog.

Figure 2 Simulation of memory and Life models

Because the times required for the behavioral memory model and the Game of Life simulations were two minutes or less, performance differences among the workstations are not very pronounced at both the behavioral (a) and gate (b) levels.

The 34 benchmarks in the RAW suite run the gamut of algorithms found in general-purpose computing. They include source code and synthesized netlists in the range of a few thousand to a million gates for binary heap, bubble sort, merge sort, DES encryption, integer FFT, Jacobi relaxation, integer matrix multiplication, and Conway's Game of Life,* as well as netlists (only) for shortest path, multiplicative shortest path, and transitive-closure graph problems.

The Life benchmark, which was chosen because of the size of its data structure, implements Conway's Game of Life program in FPGA hardware. Five versions of Life are contained in the suite consisting of 1, 6, 32, 48, and 128 elements. For our benchmark, we used Life48 and Life128, which represent designs of approximately 354,000 and 976,000 total gates, as detailed in the RAW documentation. It should be noted, though, that author James Lee estimates the total gates for the two designs at closer to 225,000 and 600,000. Additional details and the necessary downloadable Verilog code can be found on the RAW Web site (http://www.cag.lcs.mit.edu/raw).

We ran the benchmark cases consecutively in batch mode, and we ran each case three times to obtain an average. We ran the 40k RISC design concurrently with a second 40k RISC design and then repetitively and concurrently with the 800k RISC. We chose the number of 40k RISC iterations to complete in the same amount of time estimated for the 800k RISC.

We ran the two concurrent benchmarks in the foreground and the background, with the simulation running in the foreground, given the maximum foreground performance boost under Windows NT. We took this approach under the assumption that a typical user would be running the simulations in the background and would want to give any foreground applications, such as coding or word processing, priority. We also ran the benchmarks with the Pentium II-based workstations running Windows NT connected to a Unix server to get a feel for using this heterogeneous environment.

Figures 1 and 2 present the raw data for the simulation benchmarks. Because the results plotted in this straightforward manner don't show the differences in performance adequately, we've also plotted the performance of each workstation, including the Sun Ultra 2, normalized to the performance of the Compaq 5100 (see Figures 3 and 4). The results for the concurrent simulations are shown in Figure 5.

Figure 3 Normalized simulation performance for the RISC benchmarks

The performance differences between the Pentium II and UltraSPARC II workstations becomes clearer when the simulation results are plotted normalized to the performance of one of the workstations. The Compaq 5100 was used as the reference because it was consistently the best performer among the Pentium-based workstations.

The data in all of these charts (and the results obtained from running the benchmarks in our Unix server­NT client configuration, which aren't shown) can be summarized as follows:

  • The Compaq 5100, with some exceptions, was 5 to 10 percent faster than the Gateway, HP, and IBM workstations.
  • The Gateway, HP, and IBM workstations, with a couple of exceptions, were roughly equivalent in performance.
  • The Dell workstation performed significantly below average.
  • The Sun workstation performed about 10 percent faster than the Compaq machine in the majority of the benchmarks and significantly better in the larger behavioral benchmarks.
  • The degradation in performance when running the concurrent simulations in the Windows NT environment was substantially the same for all the machines.
  • The only significant differences seen in the benchmark performance in the client/server configuration involved data dumps and were primarily affected by the speed of the LAN connection, which was only 10 Mbits/s.

After carefully reviewing the architectures and configurations of the workstations, the results, including the exceptions, can be accounted for by the differences in the memory controllers and memory types used and by the differences in I/O performance and disk drives.

Figure 4 Normalized simulation performance for memory and Life benchmark

The Sun Ultra 2 (UltraSPARC II) came in far ahead of the Pentium II-based workstations in the behavioral memory benchmark because of the large amount of I/O used in data dumps and the Sun's apparently more efficient handling of I/O.

The overall differences in performance, except for the difference due to I/O performance, are most likely attributed to differences in memory architectures, and in particular, memory controllers. The Compaq 5100 uses two memory controllers from Reliance Computer Corp.; the IBM, Hewlett-Packard, and Gateway workstations use Intel's 440LX AGPset chip set; and the Dell workstation uses Intel's 440FX PCIset.

The Compaq 5100, as well as the other members of Compaq's workstation line, use what Compaq calls a Highly Parallel System Architecture to maximize system bandwidth. Most Pentium II workstations running Windows NT support two CPUs for concurrent instruction processing, but the overall system bandwidth is limited because each CPU must compete for access to critical subsystems, such as memory and I/O. Compaq's parallel architecture addresses the need for greater overall system bandwidth by using dual memory controllers and dual-peer PCI buses.

The dual memory controllers process memory requests in parallel, significantly increasing overall memory bandwidth. Each of the memory controllers has a bandwidth of 533 Mbytes/s, and with the memory distributed equally between two DRAM banks, the total memory bandwidth increases to 1.07 Gbytes/s. In contrast, the other Pentium II workstations with only one memory controller can offer memory bandwidths of only 533 Mbytes/s.

Figure 5 Concurrent simulation

Not surprisingly, the simulation performance slowed noticeably when two simulations were run concurrently. Because the foreground application was given the maximum boost under Windows NT, the simulations running in the background took about three times longer for the smaller design and four times longer for the larger design. All of the Pentium II workstations showed roughly the same amount of degradation.

Because of the increased memory bandwidth, Compaq opted to use EDO DRAM rather than the somewhat more expensive, albeit faster, SDRAM. With SDRAM, the Compaq 5100 may have achieved even higher performance. Compaq also uses dual-peer PCI buses in its workstations to increase system I/O bandwidth. A single PCI bus provides an I/O bandwidth of 133 Mbytes/s, which must be shared by peripherals such as the graphics controller and hard disk drive. With dual-peer PCI buses, each bus can provide peak bandwidth in parallel with the other controller, allowing twice the bandwidth of single-bus architectures or an aggregate I/O bandwidth of 267 Mbytes/s.

In contrast to Compaq's use of a non-Intel memory controller, the Gateway, Hewlett-Packard, and IBM workstations use the Intel 440LX AGPset. The 440LX is the first in a series of AGP chip sets that, according to Intel, are intended to optimize the performance of the Pentium II processor for business and home users.

The 440LX extends the system bandwidth to the graphics controller, complementing Intel's Double Independent Bus architecture, which consists of a processor-to-cache bus and a processor-to-memory bus. It optimizes the system bandwidth and concurrency through the implementation of Quad Port Acceleration (QPA) which provides four-port concurrent arbitration of the processor bus, graphics, PCI bus, and SDRAM memory subsystem. AGP gives PCs the head room to handle memory-intensive 3D graphics applications.

Had we benchmarked performance that was more graphics-oriented, rather than computation- and memory-oriented, it's not clear that the Compaq 5100 would have shown an advantage over the Gateway, Hewlett-Packard, or IBM workstations. This is a difficult call, however, because the 440LX is balanced to optimize graphics performance, the PCI bus, and the memory subsystem. On the other hand, the Compaq machine has a dual-memory controller to increase memory bandwidth and dual-peer PCI buses to increase the bandwidth for PCI devices, but it doesn't offer AGP for graphics.

The 440FX PCIset is an integrated chip set intended to deliver Pentium II and Pentium Pro processor performance to mainstream business systems at an "affordable price," according to Intel. The 440FX consists of three components: a PCI bridge and memory controller, a data bus accelerator, and a PCI-ISA-IDE accelerator. The memory controller supports FPM (fast page mode), EDO, and burst EDO DRAM, but not SDRAM. The worst performer in our benchmarks, the Dell Workstation 400, was the only machine using that controller, although the Compaq 5100 also uses EDO RAM. It's our opinion that the combination of the controller and EDO RAM negatively affected the Dell workstation's performance. (Dell does, however, make workstations with the 440LX chip set, but we don't know if they also use EDO RAM.)

Impact of I/O on performance

Although performance variations caused by differences in memory architectures and memory types is readily apparent from the data for overall simulation time, accounting for some of the exceptions requires a closer look at the three elements that make up the total simulation time--compilation time, link time, and run time. Though differences in compilation time don't significantly affect overall simulation time, the differences in link and run times do, because they each involve different amounts of I/O and writes to disk.

An EDA Migration Strategy by Wayne Flourney
Like many engineers, those at Compaq have been living in two worlds: the world of Unix and the world of PCs. We needed to run all of our EDA applications, as well as corporate office applications such as word processing, spreadsheets, and project management tools. The first ran on Unix-based workstations, and the latter on PCs.

Several years ago, we tried to run EDA applications on our own Intel-based machines. At the time they didn't include the features that were standard on workstations, such as large amounts of memory, advanced system architecture, and network interfaces. We added extra hardware options to the standard PC configuration, but it was difficult to get the tool vendors to port their applications to our nonstandard platform.

But hardware was only part of the problem. The operating system was still Unix, because it was the only viable alternative for EDA applications at the time. The PC versions of Unix didn't provide the same functions as our Unix workstations, and our company-standard office applications still wouldn't run on our PC Unix machines. Eventually we realized that the problems we encountered running on a platform like x86/Unix outweighed the net cost advantage we might gain over expensive RISC or Unix systems.

Today, the picture is totally different. The performance of a Pentium Pro or Pentium II Windows NT-based workstation now rivals the performance of even the fastest RISC workstations, and the combination of price and performance outstrips any Unix machine. Furthermore, Windows NT-based workstations now come equipped with advanced system architectures and built-in network interfaces. Memory capacities on a Windows NT workstation now rival those of Unix workstations, and using industry standard dual in-line memory modules is much more cost-effective. Windows NT has the features and robustness that make it a suitable operating environment for EDA applications.

Because there's a solid base of Intel/Windows NT products now suitable for EDA, application vendors are becoming interested in making their applications available on Windows NT. Momentum is building and the indications are that Windows NT will rapidly become an industry-standard platform for EDA.

Planning the move

In our environment, we have a dozen or more applications that we support for electrical and mechanical design. We've developed an extensive supporting infrastructure for our applications and computing environment, which consists of more than 100,000 lines of internally developed software. Although we have considerable knowledge of Unix, we had relatively little knowledge of the Microsoft world at the onset.

Knowing the size of the problem but not knowing what the Unix and Windows NT interoperability issues might be, we divided the applications by function or design discipline with the intent of minimizing interoperability issues wherever possible.

In most of our product divisions, there's an organizational separation between ASIC designers and board or system designers. Our system designers perform extensive simulation of new ASIC designs, but they rarely use other ASIC design tools. ASIC designers, on the other hand, use just about every tool we support in addition to logic simulation tools. We also realized that roughly 70 percent of our computational resources are consumed by logic simulation. It became obvious that we needed to separate logic simulation from ASIC design in our migration plans.

First, we decided to tackle an application that was easy to migrate, then go after logic simulation, which would yield the maximum benefits. We used the first application to learn about Windows NT and the associated interoperability issues and to explore potential installation and distribution issues.

We decided to move our signal integrity applications to Windows NT first, then move into the next phase of our Windows NT transition, which entailed changing over a large part of our simulation computational resources. At this point, we had installed a 60-CPU compute farm based on the Compaq Professional Workstation 8000, which has dual 200-MHz Pentium Pro processors but supports up to four processors. These systems are currently available in our production design environment alongside our Unix-based systems.

Making your move

With so much at stake, planning a migration from Unix to Windows NT must be accomplished without disrupting the design flow or compromising your stringent project deadlines. The answer is to gradually phase out the proprietary environment and begin to migrate all your electronic development to Windows NT.

When you first consider migrating to Windows NT, it may seem like a daunting task. Like Compaq, large companies tend to be sophisticated users with a significant investment in internally developed tools and utilities. Typically, the more sophisticated you are as an EDA user and as a company, the larger the task. The first thing to do is to divide the task and determine a migration plan that works for your organization. Factors to consider include:

  • the tools or base applications in your design environment
  • a suitable operating environment
  • the supporting infrastructure for the applications
  • the supporting infrastructure for the operating environment

In most cases, migrating the supporting infrastructure is actually a greater effort than migrating the applications themselves, because of the need to migrate many home-grown lines of complex code, tools, and libraries that have been created to allow the transfer of complex data across applications and diverse platforms.

There are many other factors that can influence how and when you might make the transition from Unix to Windows NT:

  • the availability of the required EDA applications on Windows NT
  • the types of design disciplines (electronic design, schematic capture, signal integrity, logic simulation, and ASIC design) or design applications supported
  • the number of users supported for each discipline
  • the complexity of each application and the supporting infrastructure
  • the requirements imposed on the operating environment for each discipline based on how each application is used
  • the company organization or, more specifically, the organization of the various engineering design disciplines
  • the skill set of support staff
  • the design area that will benefit the most in the shortest possible time

One approach is to divide the applications by function or design discipline to minimize interoperability issues wherever possible. Some of the disciplines, such as signal integrity analysis, require just one tool. Others, such as ASIC design, require many tools. In addition, some disciplines like signal integrity analysis tend to have fewer engineers with very specialized skills, whereas other disciplines tend to encompass larger numbers of engineers. Another approach might be to separate ASIC and system designers organizationally.

In general, it's best to migrate the least complex application first. Doing that, you'll learn to be better prepared to face the issues that may arise at later stages of the migration plan. Once you've successfully migrated a simple design process, the next step would be to migrate the application that will yield the most significant improvement in performance or productivity.

Ultimately, your organization will be running smoothly in an integrated Unix-Windows NT environment that gives you the performance you need and significant savings in hardware and support costs, reduces the need to develop expertise in both Unix and Windows NT, and protects your previous and future investments.


Wayne Flourney is the director of engineering at Compaq Computer Corp.'s workstation division in Houston.

Let's first examine the performance of the Pentium II machines and the Sun Ultra 2 for the behavioral memory model. This simulation involved a data dump of about 130 Mbytes for each simulation, and the I/O and disk drive performance played a major role in determining the benchmarked performance. The Gateway E5000 and HP Kayak had faster disk drives than the Compaq 5100 and therefore showed better overall performance on the behavioral memory benchmark.

The IBM Intellistation also shows performance comparable to that of the Compaq machine for the 800,000 RISC gate-level simulation and superior to that of the Compaq machine for the 1.3M RISC gate-level simulation. In these two cases, the data structure size (approximately 524 Mbytes and 841 Mbytes) exceeded the 512 Mbytes of memory available to the simulation and required some paging to disk. The RAID used in the IBM workstation (the only one so equipped) appears to have given it an advantage.

What's especially interesting, however, is that for the behavioral memory benchmark, the Sun workstation literally "blew away" all of the Pentium II machines, including the Compaq 5100. The reasons start to become clear when the link time and the actual simulation run time are broken out for the Compaq 5100 and Sun Ultra 2 (see Figure 6a).

Because the link process for the behavioral memory model is essentially computation- and memory-intensive and accounts for only a small portion of the total simulation time, the differences in computational and memory performance between the Compaq 5100 and the Sun Ultra 2 aren't significant. But the difference in run time, which involves a large data dump to disk, gives the Sun machine a major advantage--roughly two times--over the other machines.

RISC and Life analysis

Although less dramatic, the difference is also apparent in the RISC and Life benchmarks, where some paging to disk is required because of the large data structure sizes. For these cases, especially in the gate-level simulations, the performance advantage of the Compaq machine lay in its superior computational and memory performance or memory allocation during linking.

To look at I/O performance and the effect of the disk drive on the benchmark performance, we swapped the 7,000-rpm drive used in the Compaq machine with a 10,000-rpm drive. We also created two partitions on the disk, one NTFS and one FAT, to evaluate NTFS versus FAT. The benchmark performance improved somewhat with the faster disk, and the FAT was slightly faster than NTFS for the data dump.

The observation that NTFS is slower than FAT was also supported by some experiments on the IBM RAID, in which we were unable to make it write any faster than the internal IDE disk (which was FAT).

Another interesting observation involves the 1.3M RISC gate-level simulation. Verilog-XL uses the full (peak) virtual memory during linking when it builds the data structure. About 90 percent of the memory then goes unused if the design is gate-level. When comparing link times, the IBM was the fastest among the Pentium II machines, with a link time of about 1,000 seconds (see Figure 6b) and a simulation time of about 5,500 seconds. But the Sun Ultra 2 came in with a link time of less than 300 seconds and a simulation time of about 4,800 seconds. Again, it was a big winner because of the superior I/O performance.

Figure 6 Link and run times

Comparing the link and simulation run times for the Compaq and Sun machines (a) clarifies why the latter far outperfomed all the Pentium II machines in the memory benchmark. Run time accounts for most of the total simulation time, and because the simulation requires a large data dump to disk, the Ultra 2 has a big advantage. For the 1.3M RISC gate-level simulation (b), the Ultra 2 won out, again because of its superior I/O performance.

Since the Sun Ultra 2, the Compaq 5100, and all but one of the other Pentium II machines were equipped with Wide Ultra SCSI disk drives (remember that the IBM Intellistation used an EIDE drive for its first level of mass storage), the Sun may have performed better because of caching algorithms and the processor-memory-I/O interface. According to Sun, Solaris uses two special algorithms, silo and elevator, to speed disk writes. Silo tags the FIFO buffer queue with "high watermark" and "low watermark" thresholds. The buffer continues to accept I/O requests from the application program until the high watermark on the I/O buffer is reached. The buffer is then drained until the low watermark is reached and then restarts the application and lets it continue sending I/O to the operating system.

Instead of writing all of the I/O in the queue in the right order or in exactly the order in which it's sent, the elevator algorithm looks at where the blocks reside on the disk and schedules which blocks are written at which time. The algorithm first schedules all of the writes moving in one direction and then schedules the writes moving in the reverse direction. Throughput is increased because it takes less time to move the head one track at a time in the same direction than to reverse the direction.

Another advantage the Sun Ultra 2 may have is in the way information is transferred from memory to I/O. Sun interfaces the SCSI controller and the memory using its UltraSPARC Port Architecture (UPA). The UPA is based on a crossbar switch with a bandwidth of 100 MHz and results in a total throughput of 1.6 Gbytes/s. In addition, the bus between the processor and the memory in a Pentium II machine is the same as the bus between the processor and I/O, namely, a 33-Mbyte/s PCI bus. Here, Sun also uses its 100-MHz UPA crossbar switch.

What's still not clear is how much of the Ultra 2's advantage in I/O performance is due to its hardware architecture and how much is due to the Solaris operating system and the disk write algorithms it implements. Either way, it's clear that if your simulations require more than 512 Mbytes of memory and you have to page to disk (or need to perform large data dumps), you'll fall off a higher cliff with Windows NT-based machines then with Unix-based machines running Solaris.

Vendor benchmarks

It's common practice for a vendor to run a benchmark that measures its own products against its competitors' products. If the vendor's hardware or software doesn't match the competitors, the benchmark never sees the light of day. But if the benchmark comes out in the vendor's favor, it's almost certain that it will be used in the vendor's promotional materials and sales presentations.

A vendor's benchmarks should be viewed with skepticism because they are, after all, self-serving and not completed under independent, unbiased supervision. Nevertheless, they can provide some additional insight if they agree with independent benchmarks. With that view in mind, we looked at several benchmarks that Compaq ran to compare its 5100 workstation under Windows NT with several other workstations, in particular, the Sun Ultra 2 (Model 2300) running Solaris.

Some Personal Observations on Configuring a Heterogeneous Unix-NT Environment
For our benchmark tests, we kept all the necessary applications and data on the individual workstations so that no other factors would affect performance. The only outside resource we used for the benchmark tests was Seva Technologies' regular license server, which resides on a Unix machine (a Sun workstation).

However, most EDA environments won't be configured as stand-alone workstations, and you need to consider several factors if you're going to set up a mixed Unix-NT environment. (We believe that the environment needs to be mixed because all the tools you might want to use still aren't available for Windows NT.) Questions include:

  • Where should data be stored--on each PC or on a server? Should the server be running Windows NT or Unix?
  • Where should the applications be stored--on each individual PC or on the server? Again, should the server be running Windows NT or Unix?
  • Should waveform or other data dumps be local or remote?
  • Should you use NTFS or FAT file systems?
  • Should the license server be a Windows NT or Unix machine?

Based on our experience with the benchmark tests run with remote applications and data and some observations Seva made while working at the same time with a client who's doing simulations on PCs, we have some suggestions.

The benchmarks indicate that the disk I/O performance of the Sun Ultra 2 was superior to that of any of the Pentium-based workstations running Windows NT, so it seems that Sun workstations running Solaris are a better option than Windows NT servers.

From a data management perspective (version control, security, backup, and such), we know that it makes sense to store the data on a server. With large designs and short simulation times, though, the overhead involved in reading the source code from the server each time you want to run a simulation during initial debugging could be significant. The Seva client running simulations on PCs came up with a good compromise that uses a set of simulation scripts to copy the data from the server to a local work space before simulation. This approach provides fast turnaround time for simulations and permits data management on the server.

PCs can be configured with sufficient disk space for your applications, and it's faster to access the applications from a local disk. The downside is that if your engineering team is growing and dynamic, you could spend considerable time installing the applications on each machine. If you install the application once on a server (preferably a Unix server) and use a PC file-sharing program, such as Samba, the PCs become more interchangeable.

Our benchmarks, for example, required as much as 10 seconds of additional run time when taking the applications off a slow server using a 10-Mbit/s network. Because you'll probably be using a faster server than we had available and a 100-Mbit/s network, this time should be negligible (compared with the total simulation time) and should save a significant amount of setup time.

For waveform data dumps, it's much faster to dump to a local disk. If you copy the data from a server to a local temporary workspace on the local PC, the data would naturally dump back to the local PC.

Our tests proved that FAT is faster than NTFS. Because NTFS is a more robust file system though, it makes sense for the server, and the PCs can still be FAT. An alternative would be to have some NTFS and some FAT partitions and use the FAT partition as a temporary work space to hold the data.

Most EDA software requires the use of a license manager. Globetrotter FlexLM, for example, is available for both Windows NT and Unix machines. Because the license server has a low impact on the system, where the license manager resides probably doesn't matter. The best reason to put it on a Unix server is that you can always query the server to check on the license. If the license is on a disk that's also served, then the user only needs to mount the proper disk.

Without a log-on server, you would need separate accounts for each person on each PC. Using Windows NT's network server log-on and remotely serving the data and applications enables a user to sit down at any PC on the network, log on, have his or her log-on scripts on the log-on server, mount the proper remote disks with the applications and data, and start to work. The simulation scripts can copy the necessary data to the local temporary work place on disk, giving optimum performance with minimal administration. If the log-on server also uses Samba (as is possible with the newest releases), the system administrator needs only to create a Unix account and copy the proper log-on scripts for new users. Although it's possible to maintain dual log-ons for Unix and Windows NT with separate servers, this setup can lead to problems.

--J.L.

Compaq benchmarked several EDA packages. Among them were three simulation packages that are highly computation-intensive and so come closest to the benchmarks that we ran. These are VCS from Viewlogic Systems, Inc. in Marlboro, Mass.; Verilog-XL from Cadence Design Systems; and QuickHDL from Mentor Graphics Corp. in Beaverton, Ore. We ran most of our Verilog-XL benchmarks singly, whereas Compaq ran four simulation jobs concurrently and looked at both single- and dual-processor workstations.

Figure 7 VCS and Verilog-XL benchmarks

For its VCS benchmark (a), Compaq configured the 5100 with only 128 Mbytes of RAM, whereas the Sun UItra 2 had 256 Mbytes. For both the single- and dual-processor machines, the 5100 was roughly 10 to 20 percent slower than the Ultra 2. For its Verilog-XL benchmark (b), the workstations were configured with 512 Mbytes of RAM. In the single-CPU configuration, the Ultra 2 was about 5 percent faster than the 5100, and in the dual-CPU configuration about 10 percent faster.

Gate-level simulation

The benchmark Compaq used consisted of a gate-level Pentium chip set (three chips of approximately 120,000 gates each) and included a bus-functional Pentium model. A bus-cycle simulation of the Pentium chip set was performed in regression mode.

Figure 8 QuickHDL Benchmark

For Compaq's QuickHDL benchmark, all the workstations were configured with 512 Mbytes of RAM. In this case, the 5100 was about 7 percent faster than the Ultra 2 in the single-CPU configuration but about 2 percent slower in the dual-CPU configuration.

For the VCS benchmark, the Pentium-based workstations, including the Compaq 5100, were configured with only 128 Mbytes of RAM, whereas the Sun UItra 2 had 256 Mbytes. The compiled executable size was 12.4 Mbytes on the Compaq 5100 and 19.9 Mbytes on the Sun Ultra 2. The Windows NT machine had peak memory usage (including the operating system) of 75 Mbytes, and the machine running Solaris had 80 Mbytes. For both the single- and dual-processor machines, the Compaq 5100 was roughly 10 to 20 percent slower than the Sun Ultra 2 (see Figure 7a).

For the Verilog-XL benchmark, all the workstations were configured with 512 Mbytes of RAM. Each of the concurrent simulation jobs used 112.5 Mbytes of memory for the data structure, and the total memory usage was 450 Mbytes in both the Compaq and Sun workstations. In the single-CPU configuration, the Ultra 2 was about 5 percent faster than the 5100, and in the two-CPU configuration about 10 percent faster (see Figure 7b).

For the QuickHDL benchmark, all of the workstations were again configured with 512 Mbytes of RAM. Both the Compaq 5100 and the Sun Ultra 2 used 832 Mbytes of memory (including virtual memory). In this case, the 5100 was about 7 percent faster than the Ultra 2 in the single-CPU configuration but about 2 percent slower in the two-CPU configuration (see Figure 8).

These results for a single-CPU configuration agree with our observations that the Ultra 2 was about 10 percent faster, on average, than the Compaq 5100.

Some conclusions

It's clear from our benchmarks, as well as those done by Compaq, that 300-MHz Pentium II-based platforms are a match for 300-MHz UltraSPARC II-based workstations and have the horsepower for EDA applications, including those as computation- and memory-intensive as Verilog simulation. Pentium II-based platforms also offer excellent performance for their price and currently enjoy a significant price advantage of roughly two to three times over the UltraSPARC II-based platform. However, there are several things to consider when moving from Unix to Windows NT.

We Still Have a Hang-Up
Everyone who uses a PC--whether it's running DOS, Windows 3.x, or Windows 95--has experienced hang-ups. Though hang-ups have become progressively fewer and fewer, everyone considering Windows NT for critical engineering applications should understand that it hasn't made hang-ups a thing of the past. Unfortunately, we, too, suffered them during our benchmarking exercise.

Four out of the six PCs we received from the vendors caused no problems throughout the duration of our benchmark tests. A fifth PC, however, which appeared to have suffered some rough handling during shipping, experienced several types of crashes or errors when we ran our benchmarks. In one incident, the machine didn't crash, but the simulation results were different.

The sixth PC ran all of the benchmarks with no problems but hung at other times when we were just looking at the results of the simulations. By "hung," we mean that the mouse pointer would freeze and we couldn't get the system to respond in any way. After rebooting, the system would work well until it hung again unexpectedly.

What caused the hang-ups wasn't some strange, unseen force at work at Seva's offices. We experienced the same problem during another project as well, when two identical Pentium PCs from a local vendor, both of which had been configured and burned in by the vendor, exhibited similar random hang-ups.

Consequently, my personal failure rate was three out of nine machines, or 33 percent. Therefore my recommendation would be that you consider buying a few spare machines and do some good testing.

--J.L.

Caveats

The first consideration is design size. Currently, Windows NT is limited to 500 Mbytes per process, and the Pentium II can accommodate a total address space of only 1 Gbyte. Windows NT 5.0 will soon overcome Unix's current limitation of 4 Gbytes of address space, but that version is not yet available. What's more, Solaris is already a 64-bit operating system, whereas Windows NT is still 32-bit.

Sun's Ultra 2 workstations have roughly a 50 percent advantage in floating-point performance over equivalent-speed Pentium II workstations. That advantage isn't significant in digital simulations like Verilog, perhaps, but it's definitely an advantage if you're doing Spice simulations, for example.

Windows NT servers are still not equivalent to Solaris servers, and it's always an advantage to have the compute farm servers fully compatible with the front-end machines. If the simulation environment includes standard Unix tools--such as make, perl, awk, and sed--shell scripts, or C programs, it may be difficult to deploy a mixed Unix-NT environment.

Just because Pentium II-based workstations are significantly less expensive than the current generation of Ultra 2 workstations, you shouldn't jump to the conclusion that Windows NT-based EDA software will ever be significantly less expensive than Unix-based EDA software. Conversely, even though you can now obtain a wealth of "shrink-wrapped" office productivity software that will run under NT, the equivalent for Unix-based workstations will be a long time coming.

Furthermore, no one should expect Sun to stand still in the face of the competition posed by Intel and Microsoft and the challenges presented by Pentium-based Windows NT workstations, especially in terms of price and performance.

*Conway's Game of Life is represented on a 2D array of cells, each cell being alive or dead at any given time. The program begins with an initial configuration for the cells, and henceforth obeys the following set of rules: A living cell remains alive if it has exactly two or three living neighbors; otherwise it dies. A dead cell becomes alive if it has exactly three living neighbors; otherwise it remains dead.


James Lee is a senior consulting engineer at Seva Technologies, Inc. in Fremont, Calif. He has 12 years' experience working with Verilog and was one of the first employees at Gateway Design Automation, which developed Verilog. Prior to joining Seva, he was with Cadence Design Systems. He has written a book on Verilog and is a part-time instructor in Verilog at the University of California, Santa Cruz.

Contributing Editor John Miklosz has 17 years' experience working on electronics publications, including several years as the editor-in-chief of Computer Design.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to miker@isdmag.com.


integrated system design  March 1998



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 2000 Integrated System Design

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About