United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Design Automation

Tool Time for Processor Design

HP's workstation group uses automation in designing the PA-8000 microprocessor.

by Brian Arnold


Brian Arnold's article is the first in a series of three that detail the design of Hewlett Packard's PA-8000 system. The second and third installments will appear in the February and March issues. The three articles are extracted from presentations being made at the 1997 Design SuperCon conference in Santa Clara, CA, in January, 1997.

Designing a high-performance microprocessor such as the PA-8000 demands careful design-automation strategies--synthesis-based strategies that can prove useful for many types of logic designs. In designing the PA-8000 microprocessor, our design team at Hewlett-Packard Co. (Fort Collins, CO) employed a variety of different design styles ranging from full custom to standard cell.

One of the important features of the design is the incorporation of automation into the design process. The first level of automation inputs changes via schematic into the place and route tool, freeing the designer from having to manually place and route the circuit blocks. The next level incorporates synthesis to create the schematic and netlist. At this level, every aspect of the design has been, or will be, scripted and the creation of a new design is simply a matter of typing a few commands and waiting for the tools to perform their functions.

The intent of the automation is to increase the productivity, reliability, and flexibility of engineers and, thereby, increase the field effect transistor (FET) per engineer-day figure of merit. Automation greatly speeds up the design change process, especially in the latter parts of the design cycle.

Automation begins with the library Detailed development of a standard cell library is the key to design automation. We designed the library for the PA-8000 to be used by engineers who design using manual techniques and by those who use automation tools. Therefore, the design tools influenced the library contents, and vice versa. The synthesis tool, Design Compiler from Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys Inc. (Mountain View, CA), directed the logical contents and the strength spectrum of the cells in the library.

In addition, cell design must accommodate the needs of the routing tool because routing ability, or porosity, is the most precious resource in the place-and-route phase. We use Cell3 from Cadence Design Systems Inc. (San Jose, CA). If the router cannot take full advantage of the lower metal levels, the block will increase in size and routing channels will have to be added.

The design of the PA-8000 used five different types of standard cell libraries: static, driver, dynamic, custom, and datapath. The first four types were used in synthesized blocks, while the last was used only in hand-designed blocks.

The static library ranges from inverters and basic logic gates to complex, seven-input logic functions, as well as a rich set of latch-based sequential elements. This library is especially important for the PA-8000 because HP processor designs rely heavily on the attributes of latches. Latch-based design in a synthesis environment is challenging because the synthesis tool, particularly Design Compiler, is currently unable to properly handle timing through a latch-based design.

The driver library includes scan circuitry. The test circuitry can preload and extract data as well as sample one or two data points during full-chip operation. The driver cells rely on a bit-slice approach because their height is much greater than their width. They contain several power buses throughout the cell, and signals common to all driver cells are routed in a horizontal overlay metal that runs across each cell. The driver cells are typically placed on the horizontal perimeter of the block, with the core-type standard-cell logic in the center. Since the driver cells are placed side by side, very little horizontal routing is needed. Instead of horizontal routing, driver cells contain open vertical tracks so routing can go through the cell from core logic to ports.

The dynamic library uses HP's Dynamic Logic Circuit Design conformed to a standard cell structure. This library incorporates a wide variety of logical functions whose implementations were adapted to be compatible with synthesis and layout. The dynamic library was primarily used on timing-critical paths and blocks. Dynamic circuits are much faster but also consume more power and are more sensitive to noise than are their static counterparts.

The custom library is a catch-all containing useful functions that were not included in the other libraries. An artwork template was provided to make it easier to layout cells that conform to all process-imposed and Cell3-imposed design rules. Several exotic cells were created for the custom library because we felt that the synthesis tool could benefit by using certain logical functions, and it turned out that the tool used them fairly extensively.

Synthesis tool flow Design Compiler accepts as input some form of HDL, one or more standard cell libraries, and user-generated constraints that describe the environment surrounding the HDL code. We achieved good result in this design by writing Verilog or VHDL code to the following guidelines:
The PA-8000 microprocessor
Although the PA-8000 is binary compatible with HP's previous microprocessors, the new processor does not leverage any circuitry from previous HP processors. HP designers were therefore free to create the microarchitecture features that promoted the highest performance (see figure).

Currently running at clock speeds up to 180 MHz, the PA-8000 microprocessor offers performance of 20.2 SPECfp95 and 11.8 SPECint95 for a uniprocessor system containing 1-Mbyte-instruction and 1-Mbyte-data off-chip cache. Two 64-bit integer ALUs and two 64-bit shift/merge units support integer operations, while two floating-point multiply-and-accumulate (FPMAC) units and two divide/ square-root units support floating-point applications.

The most notable feature of the chip is the large instruction reorder buffer (56 entries), which serves as the central control unit. To address the taken-branch penalty and increase fetch bandwidth, the PA-8000 incorporates a branch target address cache. This structure associates the address of a branch instruction with the address of its target. To reduce the number of mispredicted branches, the PA-8000 implements a branch-prediction algorithm in hardware with a 256-entry branch history table.

The PA-8000 is fabricated in HP's 0.5-µm, 3.3-V, CMOS process, with a 0.28-µm L-effective. This process uses five metal layers--two for tight-pitch routing and local interconnect, two for low-RC global routing, and a final layer for clock and power-supply routing. The die is 17.68 x 19.1 mm, contains 3.8 million transistors, and has 704 I/Os. Approximately 75 percent of the chip is either full-custom or semi-custom.

PA-8000 functional block diagram

  • We wrote small, easy-to-manage chunks of code that contained only related logic. Design Compiler can manage designs that are much larger than 3 to 5 kgates, but the amount of time required to compile large designs can become excessive. Designs of about 1.5 kgates or less work best.
    Figure 1. The synthesis flow shows the many places for design iterations.

  • We took advantage of using hierarchy and wrote smaller, easier to manage chunks of code. A large design can be composed of many smaller pieces, and hierarchy helps to minimize unrelated logic and keep compile times low. Providing Design Compiler with unrelated logic does not improve the synthesis results. We saved time overall by putting the unrelated logic in another module.

  • We used case statements rather than if-then-else structures, and used parallel_case and full_case compiler directives where applicable. This practice prevented the area and delay penalties that result from unintentional inferences of priority encoders.

  • We took advantage of Designware operators such as +, -, >, and <. Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys provided a set of components for this purpose. In addition to achieving higher performance, the use of Designware components significantly reduced compile times.

  • We improved performance by using a latch-based design. Latch benefits included phase stealing, fast MUXing, and qualification capabilities. As mentioned earlier, a drawback to using multi-input and qualified latches with Design Compiler was its inability to infer or time through these latches. Consequently, we had to rely on other timing tools to assure that performance goals were met.

Synthesis methodology Defining the constraints imposed on each block is one of the most important steps in block synthesis (see Figure 1). In this design, we found that difficult constraints were signal specific and required a substantial amount of communication with connected block owners. Global signals also required special attention, especially those signals whose inputs and outputs were logically connected and not interrupted by a sequential element. Any amount of RC delay was likely to put these signals at risk of timing failures.

Therefore, we had to evaluate our synthesis methodology carefully. We decided to use an HDL- or logic-driven hierarchy, and we followed the sequence recommended by Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys : characterize, write script, & compile. This sequence essentially took a snapshot of environmental conditions and looked at each level of hierarchy individually without the encumbrance of unrelated logic.

To fully automate the updating process, we developed a set of scripts that allowed us to translate a change in the HDL into a newly synthesized design. As a result, the design meets timing requirements, without user interaction. Creating these scripts, however, is not always easy. There are two approaches to generating the scripts: rifle or shotgun. In the focused approach, we look at one approach at a time, observe the results, and adjust the methodology accordingly. In the broader approach, we aim in the general direction of the solution, try many approaches, analyze the results, and select the best results for further processing.

Each of these methods adjusts parameters such as flattening, structuring, clock skew, area, path_groups, groups, medium-effort compiles, high-effort compiles, and incremental compiles. We can quickly develop combinations of possibilities by creating an assortment of scripts. These scripts perform one or two compile options each, and we can launch them in parallel--provided we have enough workstations and compiler licenses.

Floorplan, place and route After performing synthesis, we created a floorplan for each block. In this phase of the design, we had to pay close attention to global routing blockages, global-driven port placement, and boundary conditions dictated by neighboring blocks that would determine the placement of cells, wires, and ports. Our ultimate goal was to create a floorplan that resulted in a highly utilized block with enough "cover-your-rear" tracks and cells left over to fix anything we may have overlooked the first time. The floorplans for the PA-RISC processors were created using an in-house graphical interface tool.

Next, Cell3 was used to place and route the individual blocks of the PA-8000 design. As in the synthesis process, the end-goal of place and route was to script the entire process so that a change in the netlist or floorplan was automatically reflected in the final design, simply by issuing a few commands.

Many design practices influenced place and route. One area, testability, was important because the optimal locations for scan chain placement can change the logic. The initial scan chain was removed from the netlist so it would not affect the placement. After the cells were placed in the blocks, we constructed the scan chain. Once the best possible scan solution was determined, a design exchange format (DEF) equivalent representation was input into Cell3, which routed the scan chain along with all the other nets.

Spare-cell insertion was another important design practice. We have found this practice necessary for most designs, and the only time we skip it is when we are 100 percent sure that a block contains no bugs. Usually, we would insert spare cells that represent a statistical sampling of the cells that compose the block. By including them in the schematic as normal cells and tying the inputs off to one of the supply rails, we kept connectivity from driving the cells' placement and generally obtained a good distribution of spare cells.

After placement, we added fill cells to occupy all legal placement areas not consumed by standard cells. Fill cells were a good place to insert substrate connections as well as bypass capacitance. Because the amount of capacitance depends on the size of the fill cells, the width of the fill cells varied from a single pitch to several pitches. The fill cell-insertion priority was rank ordered, with the largest first. As space availability decreased, the size of the fill cells also decreased until all of the available placement space was consumed.

Design verification Although design verification is generally covered last when describing design flows, we have found that it should not be the last step taken. Rather, we performed incremental verification steps along the way, starting before the first synthesis run. In the first step of verification, we compared the outputs of our RTL simulations to a golden model.

Next, we compared the RTL description against the first output of synthesis. If we encountered any errors, we immediately looked for errors in the functional descriptions of the standard cell library.

We then compared the schematics against the artwork to verify that the composition was carried out properly. As a last safety-valve check, we verified the artwork against the RTL description. Running simulations on artwork verified that the scan chain worked properly. Since the scan chain was connected after placement, the schematic did not contain the correct connectivity, and simulations on the schematic were meaningless. Therefore, by running a few tests that involved the scan chain, we were assured of the scan chain's validity.

By automating designs using the same tools and scripts throughout the design process, we are able to make changes faster. In the past, we have found that once the automation chain is broken by manual changes, it is difficult to get back to the automation loop; therefore, the design flow needs to accommodate all design styles and include "push-button" change implementation. The alternatives are restarting the entire layout and generating a new mask set for every change. *

Brian Arnold is a design engineer at Hewlett-Packard Co. (Fort Collins, CO).

To voice an opinion on this or any Integrated System Design article, please e-mail your message to michael@asic.com.


integrated system design  January 1997



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome
Copyright © 1996 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About