benchmark
EDA Platform Benchmark: Place and Route
The Windows NT platform continues to perform in an EDA environment, this time handling the key place-and-route phase with ease.
by James Lee and Peggy Aycinena
| |
|
Our regular readers will welcome this next installment in our EDA benchmark series. Over the past year, we have evaluated various design tools on Windows and Unix platforms,
running them through their paces on a variety of NT machines and a Sun workstation. Other segments of this series related to the ASIC design flow covered simulation (March 1998, p. 62 and September 1998, p. 44), synthesis (July 1998, p. 56), and simultaneous simulation and synthesis (November 1998, p. 50).
Our efforts in this month's benchmark focus on place-and-route tools. Interestingly enough, when the series began back in March of 1998, we intended to examine the entire ASIC design
flow. Until recently, however, we lacked access to a viable place-and-route tool designed to function on both platforms. Now Snaketech U.S. (San Jose) has made available to us the place-and-route tool evaluated here and we have moved forward to frame our tests and examine performance. Therefore, with this benchmark, we complete our first review of the major elements that constitute the ASIC design flow tools suite--simulation, synthesis, and place-and-route.
Our results for the Snaketech
place-and-route tool reveal that the Compaq SP700 machine was slightly faster across all four designs we tested. However, the at-most 10 percent performance improvement of the Compaq over the IBM and HP machines doesn't allow us to designate the Compaq a clear winner. Overall, all three NT machines handled the place-and-route process gracefully.
The history
We started to run this series of benchmarks early last year and from the beginning, the process has been an eye-opener.
Running these benchmarks for a magazine has developed into a slightly different process than a traditional benchmark; we weren't quite sure what we were looking for. We knew we would be able to run a couple of thousand gates on the NT machines when we started, but we never dreamed we would be running a simulation of over a million gates. It looked like the NT machines were running on par with a Unix workstation when we matched machines of equal megahertz and ran one EDA task at a time.
Initially our work at Seva addressed the impression that NT suffered instabilities when handling vigorous EDA-type applications. We started out suspecting that the operating system couldn't handle a complex two-day simulation and waited for the system to crash. However, NT performed well beyond our expectations, and over time we proceeded to run bigger and bigger designs without OS problems. Eventually we were able to run several month-long simulations--definitely an unexpected result.
Why
were we so surprised? We had previously thought that the rigorous demands of EDA tools would tax the NT operating system beyond its abilities. However, after evaluating the results, we concluded that it's non-engineering office applications that make NT unstable--particularly in comparing the performance between Linux and NT. In fact, we now believe that the EDA applications are in general well-hardened, whereas non-EDA applications are not; the non-EDA applications are often shipped with latent and tolerated
bugs. Engineers seem to accept crashes in using daily non-EDA applications, but definitely don't accept crashes in the process of doing simulation and synthesis.
Today's design engineers often need to use a variety of non-engineering applications such as word processing and spreadsheet tools on a daily basis. For example, engineers are frequently jumping into Excel on the spur of the moment to access a quick math application for data analysis and evaluation. These applications comprise
an important part of an engineer's office productivity and an undeniable part of the engineering desktop today. In fact, the availability of these applications in Windows emerges as one of the factors that make NT attractive as an engineering operating system.
The question then arises: How many engineers are actually using workstations for non-design applications--for example, word processing or spreadsheets--as well as EDA applications? A recent ISD survey indicated that 46 percent of
our readers resort to maintaining two machines on their desktops--one to provide EDA workstation functionality and one to provide office productivity. In fact, some applications and features don't even exist in a Unix-compatible format. As far as word processing is concerned, even loyal Unix fans must admit that a simple cut-and-paste maneuver is faster and easier in a Windows environment than in a Unix setting.
An important distinction to monitor, then, tracks EDA workstation tasks
versus office automation tasks. In an ideal world, all applications--administrative, EDA, and a host of others--would function on a single platform. In reality, if the design engineer were running a single design application he or she could probably get away with using a single Windows machine that would handle that one design need and the rest of the engineer's requirements. However, it's rare that the design engineer needs only a single design function.
In fact, we expect later this year
to explore EDA applications that undoubtedly will overwhelm the capacity of any Windows platform; we predict that these large-scale projects will need a RISC processor and 64-bit operating system. In fact, when the design demand increases to networking four or five machines, we expect to find no contest at all between the capabilities of a group of Windows workstations versus a grouped set of powerful Unix machines.
The haunted benchmarks
We should make mention here of a
now-famous benchmark series that compared a Linux machine with an Apache Web server to an NT machine with Microsoft BackOffice. The companion benchmark ran a Linux machine with a Samba file server against an NT-based file server. Though the two benchmarks were funded by Microsoft, the results nonetheless caused quite a stir when the Microsoft solutions won, hands down, over the competition. However, the engineering public arrived at a slightly different conclusion after inspecting the specifications and parameters
of the tests and reviewing the results--several superbly tuned Microsoft solutions had gone up against de-tuned non-Microsoft solutions and, not surprisingly, won.
The fallout from these benchmarks appears to have caused a major shift in thinking in the engineering world about benchmarking, and precipitated a reduced level of respect for such corporate-sponsored testing. Some of that fallout may haunt readers of this benchmark and we would like to address those concerns.
We acknowledge that parts of the benchmark series conducted here have been partially funded by Microsoft. However, both the magazine and Seva Technologies are independent entities; we are doing our best to conduct an objective study, which admittedly may require a leap of faith for some in the industry. Note that we do make our benchmark procedures, code, and test designs readily available on the ISD Web site. We invite follow-on studies from players in industry who would work to evaluate our
findings. And we always welcome feedback on any results published in this series. Hence we offer this proactive stance regarding our objectivity and our efforts to produce a technically credible benchmark.
Looking forward
Framing our current benchmark in the context of future work, we hope to approach benchmarks in a couple of different ways, principally by separating our investigations into two categories. One will be called "typical," evaluating an average EDA workstation that an
engineer might have on the desktop, as well as the viability of various tools on that platform. The other category we will term "high-end," and will include tools and platforms used at the server level of configurations, 512 Mbytes of RAM and above.
In pursuing this dual strategy, we will essentially be able to declare more than one winner for a particular benchmark. And we look forward to additional complexity in tools and platform arrays for future installments.
In
addition, we will further delineate upcoming benchmarks along the lines of the tools and platforms. The tool portion of the strategy would allow us to examine new releases of current offerings as they occur, along with initial offerings from new players in this volatile EDA tools market.
The platform segment of the strategy will explore the latest versions of currently marketed processors, or brand new releases, as well as updated versions of operating systems as the various vendors release
them. Therefore, our idea is to revisit the earlier benchmarks using these latest versions of processors and platforms. The actual engineering time in these cases would be minimal; in contrast with looking at new tools, we would require the greater effort in setting up the benchmarks and analyzing the results. We look forward to these benchmarks because they will allow for in depth analysis of the industry products as they evolve. This combination of tool and platform reviews will offer our readers
guidance in purchasing the appropriate EDA workstation to match the required EDA tool. Therefore we will be broadening our test suite with more tools going forward.
Our current definition of a typical desktop configuration includes 256 Mbytes of RAM, while the high-end server type of machine comes with between 512 Mbytes and 1 Gbyte of RAM. Although these are our arbitrary designations--typical versus high-end--they should allow the vendors to offer workstations from various product lines
representing distinctly different price ranges. This discussion must be grounded in the understanding that most vendors already consider EDA applications to be high-end and vendors' sales efforts generally target the EDA end user with only those product lines that run on high-end machines.
Therefore, in requesting several levels of functionality from the vendors, we will undoubtedly see low-end configurations of high-end machines in contrast to high-end configurations of high-end machines. In
fact, that ties nicely to our experience in some of our previous benchmarks. Although we have reported some success in using off-the-shelf systems from local computer retailers, in general we had less success with those systems than with systems that were configured at the outset to meet the demands of EDA applications.
The two-fold query
Here then, with the completion of the ASIC design flow overview, we are in a situation to pose two crucial questions regarding NT versus the
traditionally accepted Unix environment for EDA tools.
The first query asks if EDA is now "real" on NT workstations. Is NT a robust enough operating system to remain stable and viable throughout the stress and strain of running EDA applications?
The second query concerns the nature of the performance metrics for the ASIC design flow when it resides in its entirety on an NT platform. We have had a chance to examine simulation, synthesis, and simulation and synthesis.
We were impressed with NT's ability to handle the demands of Verilog-XL, the first simulation tool in our series. We then successfully added Synopsys Design Compiler and NT started to look like a real EDA operating system. So we approached this last piece of place and route with great interest, anxious to discover if NT was up the task of handling the final major segment of the total ASIC design flow.
The machines
Our goal has always been to test EDA tools on the latest
hardware. In that spirit we wanted to use the 500-MHz processors now available. However, for this benchmark go-round we have come up against an unforeseen timing problem with respect to procuring all of the hardware. In these busy weeks before DAC, the latest Sun offering--the UltraSparc IIi--turned out to be unavailable for our project. The EDA vendors and the rest of the design world have locked up all of the review models for evaluating their applications in anticipation of DAC. We therefore didn't have access
to these Unix machines either as loaners or to rent; purchasing the workstation was never a viable option, which will not surprise our readers.
In spite of the missing Unix piece in this month's benchmark, we feel that examining the results of the place-and-route tool performance on our Windows machines will provide valuable insight nonetheless. In future benchmark articles we look forward to returning to our previous strategy of evaluating EDA tools on a range of NT platforms versus the
performance of those same tools on one or more Unix platforms.
The machines we are using in this benchmark are loaners. Up to this point we have been using whatever machines the vendors could make available within the minimal configuration specifications we provided. In this case our request was for a 500-MHz processor and 1 Gbyte of RAM.
Going forward we would like to move to a new strategy. We would like to make the benchmark application suites available to our
hardware vendor/suppliers as soon as possible so that they can tune the machines as they see fit. We believe that this situation will allow the suppliers to be able to present us with an optimum configuration for the application intended--a situation that accurately mirrors the procedures that vendors follow in answering sales requests from customers. The vendors always try to evaluate the needs and resources of the eventual end-users in concluding which mix of hardware, operating system, and price to recommend
for purchase.
Subsequently, we would like to carry our work to the point that we can come up with some price/performance metrics for these machines. And since this is our first experience with the Snaketech place-and-route tool, it's intriguing to consider future possibilities once we have more place-and-route tools from the mainline vendors to evaluate.
Our decision to use single-processor machines for this benchmark arose from our past attempts at multiprocessor
benchmarking. We found that the additional processor sometimes helped and sometimes hindered. We consistently found unpredictability in the use of the dual processor while running single applications on such workstations. The characterization of multi-processor machines needs more work and is a possibility for future benchmarking.
The current benchmark included machines from IBM, Hewlett-Packard, and Compaq. Interestingly enough, IBM provided an Intellistation Z Pro with 550-MHz
dual-Pentium III Xeon processors and 2 Gbytes of RAM. We applaud the enthusiasm with which IBM provided this heavy-duty workstation. However, to match the con-figurations of the other benchmark hardware, we kneecapped the machine down to our original specifications of a single-processor 500-MHz workstation with 1 Gbyte of RAM. The system was configured with an IDE RAID controller, enhanced ATA interface, and 512 Kbytes of L2 cache.
Our HP machine was a Visualize X500 with a single 500-MHz Pentium
III Xeon processor, a 512-Kbyte L2 cache, 512 Mbytes of RAM, and an Adaptec RAID controller. We had initially assumed that the system had been configured per our specifications to 1 Gbyte of RAM. However, our discovery of the smaller amount of RAM turned out to be inconsequential, one of our key conclusions being that place-and-route tasks are not memory-intensive processes. The SP700 from Compaq was configured with a single 500 MHz Pentium III Xeon processor, a 512-Kbyte L2 cache, 2 Gbytes of RAM, and a
Milex RAID controller card.
The only major differences among the three 500-MHz workstations lay in their disk controllers. We originally asked the vendors to provide RAID solutions, trying to avoid a repetition of our previous experience where the speed of disk I/O affected the outcome. Last year one vendor provided hardware RAID and another provided software RAID. The software RAID offered no advantage at all. Only the hardware RAID solutions were valuable. Fortunately, disk I/O doesn't
constitute a major issue for our place-and-route design tasks, so in the end we overlooked disk controller differences.
Place and route
The current offering from Snaketech consists of Cellsnake, a place-and-route system for custom standard-cell-based ICs, and Gatesnake, a place-and-route system for gate arrays and sea-of-gate implementations. These tools come out of the Swiss Federal Institute of Technology and run under a variety of NT and Unix flavors.
The
Snaketech tool used in this article is the first ASIC design place-and-route tool that was available to us both on NT and the traditional Unix EDA platform. From that standpoint, the Snaketech offering was our only choice, although our readers may not have heard of this tool because it doesn't come from one of the industry leaders. To date, Snaketech has provided us with enthusiastic phone support, offering as much help as possible considering that the benchmarking process is a far cry from the average
customer installation service needs.
Seva Technologies, as a markedly international company, has language skills that span English, Russian, Hindi, and--importantly, in this case--French. The French company, Snaketech, has only recently opened offices in the United States. To our relief, our French-speaking staff engineer was very successful in interfacing with the Snaketech phone support personnel in French. This is a small footnote, but an important one to anyone in engineering design who
might be interested in actually using international offerings in EDA tools.
Library formats
In our previous work with Verilog and Synopsys, the input and output library formats posed no problem. With Verilog-XL, our input and output formats were Verilog; to run Synopsys Design Compiler, we needed only a Synopsys library. In previous benchmarks, we used the Synopsys library from Silicon Access.
For this benchmark Artisan Components, Inc. set us up with their
TSMC 0.25-µm library, which is quickly developing into an industry standard, occupying the position previously held by the LSI 100 class libraries. In using this library, we were doing up-to-date engineering, and we experienced few problems using the Synopsys Design Compiler with the Artisan TSMC 0.25-µm library.
Artisan provides libraries in several standard formats, including Verilog, Synopsys, and LEF--library exchange format, the industry-standard format for place-and-route
tools. Things were moving along nicely until we discovered that Snaketech doesn't accept libraries in LEF. However, Snaketech does provide translators. Therefore, navigating the rocky road from logic design successfully through the place-and-route function provided an unexpected challenge in using the Snaketech tool. Although frustrating, this impasse reflects the reality of the engineering effort required to make the interface occur.
Our solution entailed regenerating the design from
the Verilog source with the Artisan library, running the design through the Synopsys tool, and creating an output that the Snaketech tool would receive. The challenge was to discover the format overlap that would allow the Synopsys output to be readable as Snaketech input. Again, we recognize that these real-world design issues and problems bear on whether the ASIC design flow is actually achievable on an NT platform.
Eventually we found that EDIF was the common format between the two
tools; it was a relief to discover that an industry standard would suffice. We then went into the synthesizer and tried to write out the EDIF, but the tool complained that there were no schematics for the design. We asked the Synopsys tool to generate the schematics and write the design, but it complained again, this time that the library didn't have a scale. We then realized that EDIF has two forms: a schematic form and a netlist form. So we poured through the Synopsys documentation and, when we finally let
go of the Synopsys default to schematics, we just wrote out an EDIF netlist. With that step we had lift-off.
A practical note that may be of use to our readers: the variable and value that needs to be set to produce netlists only out of the Synopsys Design Compiler is
edifout_netlist_only = "true"
.
The intersection among the Synopsys and Snaketech tools and the Artisan library reflects a common conundrum: What path does an engineer need to follow to accomplish
compatibility? In a large enterprise, engineers should have a hefty CAD support organization at his or her disposal, attending to the specific tasks of grooming and trimming libraries and providing expertise on various tools. In smaller companies, design engineers often provide their own CAD support and our library problems in this benchmark illustrate that situation.
A glue we have discussed in the past consists of Shell scripts and Perl scripts and the expertise of in-house CAD
departments, trying to utilize more and more shrink-wrapped software--including libraries. The question remains, though: How do we work our way through these components without a heavy in-house CAD support organization? Basically, we are a small engineering firm trying to pull off a benchmark, as opposed to a design team in a large enterprise trying to do a big design. Our approach to these design challenges is therefore a bit simplified, although the problems posed remain the same.
Typically,
our customers who are doing design, synthesis, and layout in-house employ engineers dedicated to library development, library-tool agreement, and timing agreement across tools. They have more engineering resources at each level of the process than can be found in a small company. But we believe that the design process is not yet shrink wrapped enough to allow designers, for instance, to simply go out and buy an NT workstation, purchase licenses for the requisite myriad tools, and crank out an ASIC in the
garage as sole practitioners. There are too many supporting activities needed at every step along the way. And as we add more tools to the benchmark suite, we expect to find even more of the problems that arise when a design team tries to work through the entire process.
The test cases
Our first step was to identify the appropriate test cases and to ensure that the designs were viable options for the tools in use here. We examined tutorial offerings from Snaketech and revisited
previous benchmark designs to determine a half dozen or so that could give us measurable run times. We eventually settled on four small or moderate designs to run and evaluate on each machine.
We wanted to use test cases with acceptable run times of at least 10 minutes or longer. Smaller designs would have made it too difficult to produce accurate measurements. Eventually, therefore, we went with two categories of designs: the three small designs that would require 1 to 2 hours of run
time, and one moderate design with a run time in the 12-to-14 hour range. Naturally, these run times rule out the huge designs that consist of a million gates or more. In so doing we opted for designs in the 100,000-gate range or less. The restriction reflects the realistic constraints of benchmarking. In the future, building on our current experience, we hope to explore place and route with much larger designs.
| Table
| Four designs
|
|---|
|
Design
| Cells
| Nets
| Pins
| Area (µm2)
| Occupancy
|
|
CTS
| 2,727
| 2,702
| 9,561
| 384,912
| 39
|
|
DES
| 1,783
| 1,937
| 7,314
| 530,265
| 29
|
|
PIC
| 1,113
| 1,138
| 3,469
| 114,816
| 46
|
|
Talisman hod
| 21,071
| 22,540
| 68,101
| 8,092,031
| 24
|
However, even restricting our work to small-to-moderate designs required a respect for engineering time. Good benchmarking requires that a particular design be run a number of times on a particular machine. A single pass is insufficient to yield reliable figures for data and results. For a small design requiring one hour of run time, iterating through the process can take one full working day by the time the test is run, data gathered, the test reset, and run again. For a larger ten-hour runtime, the
process can wrap into a 24-hour process with staff needed to monitor the iterations around the clock to achieve a 3-run minimum of data. If we run multiple tools, the engineering time increases accordingly.
| Figure 1
| Bigger is bigger
|
|---|

The three small test case designs are characterized by fewer than 3,000 cells. The larger test case, although not in the
million-gate range, represents a significantly larger design.
|
We had sufficient knowledge of Verilog-XL and Design Compiler that we could predict what function of the hardware they were exercising. Unfortunately, we lacked that experience with Snaketech since the company and the tool were both new to us, adding to the challenge involved in this benchmark.
The first of the four designs we used was the small clock tree synthesis
tutorial design (CTS) provided by Snaketech (see the table). We had experience with the second small design--the Data Encryption Standard (DES) module--during previous benchmarks. We obtained this test case from the Reconfigurable Architecture Workstation (RAW) project at the Massachusetts Institute of Technology. Our third design, a model of the PIC 16C5X RISC processor designed by Tom Coonan at Scientific Atlanta, provided a relatively small circuit, but occupied the greatest percentage of design area.
Finally, we chose the Microsoft Talisman graphics accelerator design to represent the moderate-sized design exercising the place-and-route tool.
The area of the design represents the product of the length and width associated with the cells. Ideally we like to keep the cell occupancy on the chip to 50 percent or below to provide adequate area for routing. Had this been a real design, we might have squeezed the area committed to routing. However, in the benchmark setting we allow ourselves the
luxury of extra area (see Figure 1). Although none of these designs lie in the million-gate range, we felt that the larger Talisman hod test case was large enough to adequately exercise the place-and-route tool.
By a nose
Our protocol included processing each design three to five times to compute an average run time and results. The three small designs were sufficiently quick--running anywhere from 30 to 90 minutes on average--to enable us to run through them five times, further
refining the average run-time figures (see Figure 2). The larger design, however, required an average run time of 14 hours. Time constraints required that we restrict the Talisman hod design runs to the original plan of three rounds of testing.
In looking at our initial results, we came immediately to an interesting conclusion. Place-and-route tools do indeed benefit from fast computers, but they don't need anywhere near the level of memory that we have needed in past benchmark studies
evaluating simulation and synthesis tools. In our earlier work we needed lots of system memory and lots of time for the run. Place-and-route tools, on the other hand, don't consume even 500 Mbytes of memory. Instead, the designs needed anywhere from 40 to 400 Mbytes of memory.
Our initial results indicate that the Compaq workstation was consistently faster than the other two machines, on all four of the designs (see Figure 2). In our earlier benchmarks we had noted Compaq processing
efficiencies, which we attributed to the RCC chip set unique to the Compaq workstation. However, although the RCC chip set gives Compaq more memory bandwidth and perhaps a slight speed advantage, the variation in run times isn't sufficiently large to proclaim Compaq the obvious choice.
Our standard protocol for previous studies has always included running three rounds of design flow testing. Our deviations in all of these previous benchmarks between the run times for the same design on the
same machine were never greater than 1 percent to 2 percent. Yet in the statistics that came out of this benchmark, we discovered a run-set deviation of anywhere from 4 to 10 percent on all machines. To verify the larger deviation percentages, we took the number of runs on the smaller designs to five, as mentioned. The run-time deviations didn't change (see Figure 3). We continued to see the 4 percent to 10 percent run-time deviations, precipitating yet another conclusion.
We know that
place and route is a much more random process than either simulation or synthesis. The randomness associated with the circuitry placement is magnified by the randomness associated with the routing task within the IC. When we fed our input into a place-and-route tool and had no access to intermediate states, the randomness of the output was a product of the random placement configuration and the random routing configuration.
| Figure 2
| Signs of the times
|
|---|

The run times ranged from 30 minutes to 14 hours. The Compaq consistently processed faster than the IBM and HP, although for the smaller designs the deviation across machines was insufficient to declare a clear winner.
|
Since we were working with the demo version of the Snaketech offering, we didn't possess the scripts that would allow us to save out
the placement configuration. If we had had that access, we could have then isolated the randomness associated with the routing solution on the three machines by feeding in a common placement solution to the routing portion of the design tool on the three machines. However, that would have required additional licensing from Snaketech, a time-consuming and unnecessary process from our standpoint. Instead we fed the raw netlist into the place-and-route tool and received the output all in one step. Undoubtedly
we have left on the table some data that would distinguish between the randomness factor of the two steps of place and route, but we feel that our results are clear despite this qualification.
Again, given the type of result deviation that we experienced, we can't declare a solid winner. The Compaq consistently produced shorter run times, but a 4 to 10 percent deviation is too broad a result window to declare a definitive conclusion. Our results do reveal, however, that for the DES
design, the "wheels fell off" of the HP machine. Given our time constraints, it wasn't possible to run the tests yet again to verify these HP results.
The IBM results aren't surprising, either. The performance of the machine was hampered by its dual processor operating system, especially since we reconfigured the machine down to a single processor.
Toolish impressions
We found the Snaketech place-and-route tool to be simple and easy to use. It arrived on a
single CD that included both the tool and the manual. We learned that utilizing the tool in the batch mode was distinctly faster than the using the graphical user interface mode, though the company markets the tool partially based on its ease of use in the GUI mode. We, however, always run our benchmarks in the batch mode because it improves our ability to time and control the design process.
In our experience, most EDA tools run via scripts. It is easier, once the tool has become familiar
to the users, to ship the script off to the computer and to let the computer work on it for a while than to sit in an interac-tive mode monitoring progress along the way. In the case of place and route, however, although it might be useful to see the critical parts of the circuit midway through the design, our repertoire of benchmark scripts nudged us in the direction of bypassing the mid-process analysis.
Another interesting observation from our work revolved around the time required to
refresh the screen. The user must wait for the refreshed screen to accept more input. Along these lines, a future benchmark might be able to examine the graphics performance of place-and-route tools as well as waveform tool offerings.
It should be noted that the Snaketech tool works comfortably with designs of 100,000 gates or less. We will be interested in evaluating tools from the mainline EDA vendors when they become available, as they may have larger design capacity.
We were left with one final impression. The folk wisdom has it that an NT machine isn't capable of remaining crashless for any processing that requires over 24 hours of run time. As in earlier benchmarks, that wasn't our experience here. The crashes we did experience were very obviously due to our inexperience in using the Snaketech tool, not to instabilities in the operating systems on the various machines. For our final runs, we actually left the workstations processing successfully for over 48
hours.
Our first evaluation shows three small designs with the average run time ranging from half an hour to an hour on the three workstations. Given statistical ranges within plus or minus 2 percent, the performance is roughly equal across the three machines in the benchmark. With the larger design needing 12 to 14 hours for completion, we saw greater run-time variation across the machines and could see that the Compaq was clearly running faster than the IBM and HP machines. Again, we
attribute this result to the memory bandwidth within the RCC chip set found in the Compaq machine.
The deviation of the run times across run sets clearly indicates a deviation of up to 10 percent on the same machine processing the same design (see Figure 3). As mentioned, there is a greater randomness associated with the place-and-route process than will ever occur in the more unique solutions found for simulation and synthesis of a design. From run to run, the tool is obviously finding
markedly different yet successful solutions for a particular place-and-route design. The situation contrasts with the smaller set of successful solutions that emerge from simulation and synthesis for a particular design. The DES design for some reason appeared to be the most indeterminate design, with a time variation across runs of up to 10 percent. The PIC design hovered at a run-time variation of slightly less than 8 percent, the Talisman hod at just over 5 percent, and the CTS design at a 4 percent time
variation from run to run.
| Figure 3
| Deviant behavior
|
|---|

The 4 to 10 percent run-time deviations across the benchmark stemmed from the randomness inherent in the place-and-route design processes.
|
It should be noted that a 10 percent time deviation on a 14-hour run is of little interest, offering a
design-time variation of plus or minus 90 minutes. Similarly, a 10 percent time deviation on a one-hour run is of even less concern. Such variations aren't going to make or break the usefulness of the tool or the platform. On a 7-hour design run, however, 10 percent represents an hour or more of additional run time. In that case, iterations of a design would spill past the normal engineer's 8-hour workday, forcing overtime to gather data from the one run and to initiate the next run, or postponing the gathering
of results to the next day. Employees and employers alike would be concerned, therefore, about a design process in the 7-hour time range that had a tendency to run long by as much as 10 percent.
Overall, it wasn't possible within the data noted to declare an obvious winner among the benchmarked workstations. The Compaq was generally fastest, but not markedly so. The HP performed within range of the Compaq. We attributed the IBM results to the effects of the dual processor, despite our
kneecapping the machine to a single-processor configuration. As a result, we decided to run one last round of tests on that machine.
Two heads worse than one?
In an attempt to compare apples and apples--the ideal situation in benchmarking--we decided to run the suite of designs on the IBM using both the 500-MHz and the 550-MHz configurations. In measuring the run times for the various designs on these two configurations, we discovered some curious results (see Figure 4). We
measured less than a 10 percent run time variation between the two configurations in running the smaller designs; the larger Talisman hod design, surprising enough, yielded no differences at all between the two run times for the two clock settings.
We believe we have an explanation. The kneecapping process does indeed reduce the dual processor to the single processor configuration. However, when a design is running on the single-processing configuration, the operating system still allows the
processing to ping-pong back and forth between the two processors.
| Figure 4
| Hertz enough?
|
|---|

Reconfiguring the IBM from 500 MHz to its original 550-MHz setting yielded minor differences in run times for the smaller designs. The reconfiguration had no impact at all on the larger design.
|
We
confirmed this observation by monitoring the task manager window for the two processors, which indicated alternating activity between the two processors. At any particular moment, one processor was functioning at 100 percent and the other at 0 percent. A moment later, those readings reversed. The two processors were sharing the design task in an alternating serial manner. This result ties into an observation noted in earlier benchmarks. We believe that dual-processor availability is not always a plus in a
workstation configuration. The switching back and forth between processors contributed to increased design times, because the system repeatedly had to load a cold cache.
These tests may be memory bandwidth-limited after all. Even though we're not using even half a gigabyte of memory, we are using more memory than the cache contains, and changing the processor speed doesn't change memory speed. This conclusion agrees with our earlier observations that the Compaq with its RCC chip set was
slightly faster.
Looking ahead
In setting the stage for the next installments in this benchmark series, we should note that we are moving towards a change in the game. Our strategy will be to release our test cases to the hardware vendors in advance, allowing the vendors to configure their machines to optimize performance as they see it relative to the designs and the tools to be used. We'll thus be moving to a software benchmark that will be running on the lowest cost and most
efficiently configured platforms. We'll also require the hardware vendors to procure their own libraries, just as we have done in this series up to now. The choice and source of the libraries will be up to the discretion of the hardware vendors.
For instance, we have demanded a great deal of memory from the hardware vendors up to this point--a minimum of a gigabyte. As it turns out, the place-and-route tools needed less than half of that capacity. If the hardware vendors had been able to
configure their machines according to the tool and design needs, they could have provided a machine of reduced expense, an attractive scenario to the hardware customer as well.
Each tool makes use of the computing resources in a unique way. This combination of tools and platforms helps to identify the right combination of hardware and software necessary for a particular end user, best summarized as "right tool, right computer." The argument applies to the place-and-route design engineer as
well as to the simulation and synthesis teams. The distinction should be made, however, that the place-and-route workstation differs markedly from the workstation needed for the other segments of the ASIC design flow. The systems here are overendowed with memory for the place-and-route solutions that we are seeking. If the additional $500 spent on memory were funneled instead into upgrading the CPU, we would have had better tuning for this particular benchmark. In addition, we have been working toward a
single desktop machine that can handle simulation, synthesis, and place-and-route tasks. Realistically, engineers don't generally perform all three of these tasks. But knowing that the machines can handle the entire ASIC design flow would certainly make for easier purchasing decisions. In the real world of engineering, evaluating the economics of the design tools--hardware and software--stands as an important consideration that must work in concert with evaluating the technical effectiveness of the tools.
The authors wish to thank Alex Cellier for his assistance in conducting the benchmark tests.
Contributing editor James Lee is a senior consulting engineer at Seva Technologies, Inc. in Fremont, Calif. He has 12 years' experience working with Verilog and was one of the first employees at Gateway Design Automation, which developed Verilog. Prior to joining Seva, he worked for Cadence Design Systems. He's the author of Verilog Quickstart and is also a
part-time instructor in Verilog at the University of California at Santa Cruz.
To voice an opinion on this or any
Integrated System Design
article, please email your message to
jeff@isdmag.com.
|