United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



ASIC Technology

Designing the UltraSPARC-1

Advanced tools and design methodology were essential, but a major aspect of the design involved tradeoffs and special considerations.

By
Shrenik Mehta, Robert Garner, Hemraj Hingarh,
Dennis Chen, Dave Greenhill, and Peter Fu


Today's high-performance microprocessors demand an extremely high degree of automation during the overall design process. This CAD requirement applies to all design phases extending from high-level architectural design and performance evaluation on down to physical design and verification. Therefore, a vast arsenal of CAD tools are essential to allow a large number of design engineers to work on a unified design database.

Sun Microsystems' (Mountain View, CA) UltraSPARC-I, 64-bit microprocessor is an example of a robust design methodology based on state-of-the-art tools (see Figure 1 ). The methodology included (1) a new design approach that leveraged previous experience and technologies, (2) a tight focus on execution and time-to-market, (3) high clock rates that demanded new circuit design and CAD know-how, (4) a portable design methodology that adapted to process technology changes if necessary, and (5) a need to leverage the robust circuit methodology.

This meant using cell-based design where possible and full-custom design of critical blocks, such as caches, translation lookup buffers (TLBs), register files, fast adders, arithmetic logic units (ALUs), fast comparators, phase locked loop (PLL), and I/O. Here, full-custom schematic and layout design were used because speed, function, or area precluded a standard cell design.

The design process started with architectural modeling, critical path analysis, and SPECmark performance evaluation. Once the architectural specification was stabilized, the design was partitioned into structural units based on timing, functional, and area constraints. For purposes of logic synthesis, simulation, and verification, each structural unit was coded in the Verilog hardware description language (HDL). The HDL representation is a register-transfer-level (RTL) description. A gate-level description of each unit was synthesized from its RTL description both automatically and manually, depending on performance requirements. Within a unit, control and datapath logic were designed separately (see Figure 2 ).

Custom blocks, such as megacells and memory cells, were designed in parallel with the RTL design. Initial physical design iterations were performed using layout bounding boxes that were generated from schematics. Each of these steps involved extensive verification. After assembly, unit layouts were extracted and timing analysis was performed for each unit.

Concurrently with unit and block design activities, standard cell and datapath libraries were created with special programs for automatic generation of schematic and layout. Input to a generator program is a text file describing desired cell parameters--number of inputs and output strength. The generators allowed cell libraries to easily accommodate evolving CMOS technology design rules. Also, the entire library was characterized and verified automatically using in-house tools.

After each individual unit was designed and verified, chip-level assembly was performed, followed by layout extraction and full-chip timing analysis. Paths that violated timing requirements were carefully examined, and redesign options were evaluated and implemented. The entire design went through numerous iterations until the target performance criteria were met.

Figure 1. The design methodology focused on the use of full-custom design on critical blocks. They included caches, translation lookup buffers (TLBs), register files, fast adders, ALUs, fast comparators, PLLs, and I/O. Full-custom schematic and layout designs were used since speed, function, or area precluded standard cell design.

Construct by correction A major aspect of that newer methodology involved so-called "construct by correction." It is defined as constantly iterating the chip design and frequently measuring each iteration's success--in effect, putting the entire chip together more often. This methodology includes continuous iterations of complete routing, physical composition, timing, and functionality at regular intervals.

Construct by correction provides continuous improvement to meet chip area and timing goals. To minimize chip timing, high priority was given to routing key signals. Through the iterations, the design team kept the logical and physical hierarchy in sync. This process simplified the final physical verification of the chip.

Examples of the advanced design tools used in chip development included Mentor Graphics Corp.'s (Wilsonville, OR) GDT developer for cell library development; MicroRoute for chip assembly; CheckMate for timing verification; Lsim for switch-level verification, and Datapath, Mentor Graphics' compiler, for datapath layout optimization.

UltraSPARC-I was built on a 0.5-µm process at Texas Instruments Inc. (Dallas, TX). However, the process was still being developed when the chip was embryonic. Still, despite TI's process not being in full production, the design team had to build the cell libraries.

GDT developer enabled designers to develop an extensive cell library that was easily modified as process details solidified. By using its open environment and accessible database, the tool complied with UltraSPARC-I design guidelines. With this tool, 80 percent of the cells were developed. As TI refined the 0.5-µm process, chip designers updated the entire library and automatically turned the new versions--over 2,000 cells.

Figure 2. The design process started with architectural modeling, critical path analysis, and SPECmark performance evaluation. Once the design's architectural specification was stabilized, the design was partitioned into structural units based on timing, functional, and area constraints. For purposes of logic synthesis, simulation, and verification, each structural unit was coded in Verilog.

Using GDT developer, designers were able to easily fine-tune the cell libraries. This productivity increase not only streamlined chip development but also contributed to the overall design performance. Also, since the tool accommodated rapid cell creation, it allowed design engineers to expand their solution space and to create a faster, more compact design.

MicroRoute was used to help pack the five million transistors into the 315-mm 2 die. The tool's strong procedural interface made it easy for the design team to capture routing specifications. For example, several programs delineated the global preroute guidelines for power and clock runs, including physical spacing. Using these custom programs, the design team easily updated the layout, quickly moving to the next revision as the physical design progressed.

Mentor Graphics also helped chip designers extend MicroRoute's capabilities for the design. Early on, the design team decided to employ a fourth layer of metal to improve performance. Then, MicroRoute only supported three layers of metal, but Sun Microsystems and Mentor Graphics added metal-four capability.

Figure 3. Hierarchical approach used to independently verify various sub-units with a high level of confidence and then integrate them into the full-chip simulation environment.

During timing verification, a difficult task was HSpice netlist extraction--HSpice is a product from Meta-Software Inc. (Campbell, CA). The design team needed to extract sufficient parasitics from the physical layout for accurate evaluation of potential deep submicron effects. CheckMate gave the team flexibility to exactly balance speed versus accuracy, allowing in-depth timing verification without swamping the database and significantly slowing verification.

The design team used Lsim for switch level-verification of large custom blocks such as caches, TLB, and register files. As for datapath design, UltraSPARC-I contains over 20 large datapaths. Manually constructing all of them would have consumed months of engineering effort. With Mentor Graphics' Datapath, engineers automatically generated a majority of the datapaths from structural descriptions, typically going from RTL to the physical design within 24 hours.

Designers experimented with different datapath structures, quickly zeroing in on the optimal configuration. Approximately 90 percent of the chip's datapaths were generated with the tool. By automating the bulk of the datapath design, the team was free to focus on the critical paths that demanded handcrafting.

Hierarchical approach Functional verification represented one of the project's most ambitious and challenging efforts. Three key verification goals were set. They were (1) to achieve fully functional first silicon that could boot the multi-user Solaris operating system and run OpenWindows in a host system, (2) guarantee SPARC V8 32-bit compatibility, and (3) guarantee SPARC V9 64-bit compliance. Using static timing analysis, timing verification was performed independent of functional verification.

A hierarchical approach was used to independently verify sub-units with a high level of confidence and then integrate them into the full-chip simulation environment (see Figure 3 ). To assure basic functionality in a host system required simulating the processor and associated ASICs to eliminate system bugs. A comprehensive multi-processor simulation environment was also developed.

It was also critical to validate the total system design and improve test coverage. The key factors were (1) the complexity arising from a new, more powerful processor and associated ASICs, (2) new firmware, and (3) new release of the operating system. To validate the system required bringing up the operating system and running application programs.

Testing the large instruction streams in an operating system is not practical with traditional simulation and requires hardware emulation. Emulating the design in-circuit was based on success with earlier MicroSPARC II emulation.

Two hardware simulators were used: Verilog-XL, from Cadence Design Systems Inc. (San Jose, CA), during the early stages and Verilog Compiled Simulator (VCS), from Viewlogic Inc.'s Chronologic Simulation Group (Los Altos, CA), for stand-alone tests. VCS was used once the design grew and stabilized. It performed full-chip and multi-processor simulations and regressions, using both RTL and gate netlists.

VCS was chosen because it is the fastest commercially available Verilog simulator. Speed is important because the number of cycles per second simulated equates to the total number of simulation cycles in a given time. Also, compiled-code simulators require less memory than interpretive simulators.

Unit-level simulation The chip's full-custom design style demanded that libraries as well as individual functional units be completely verified. Each unit employed two simulation and verification approaches. Integer and floating point units that executed instructions could employ SPARC assembly code for functional tests. They employed a Verilog stub-model environment. Bus-oriented units, such as load store and external memory subsystem cache controller, used a common driver and stimulus generator checker program (CUDL/MSG). The unit's test environments, with their low-level interface, provided better signals and bus transaction control.

After stand-alone tests verified sub-units, the strategy was to verify the chip in several system environments. The default simulation environment included a fully integrated UltraSPARC-I design with simple behavioral models for the rest of the system. Another environment modeled the uniprocessor system and instantiated Verilog models corresponding to the real system ASICs. A third environment instantiated four UltraSPARCs for multi-processor simulation. In each environment, different processor modes were simulated to enable or disable various processor features.

Gate-level validation Gate-level simulation verified the integrity of the netlist after synthesis, place & route, and layout. Since control blocks, datapaths, and megacells used different design flows, slightly different netlist versions were used at different stages of the design cycle. Also, since emulation tools require structural netlists with no behavioral code, gate-level simulations verified correctness of the netlists delivered to the emulation team.

Figure 4. Emulation methodology consisted of four major phases: pre-configuration, configuration, testbed, and in-circuit emulation.

Early in the design, gate-level verification was employed for the design's control blocks by substituting synthesized versions of the blocks into the chip netlist. As the design stabilized and layout began, netlists extracted from layout were incorporated into the gate-level simulations. A flexible, modular methodology allowed the units' netlists to be moved from pre-layout to post-layout over a number of design releases.

Hardware emulation The emulation methodology consisted of four major phases: pre-configuration, configuration, testbed, and in-circuit emulation (ICE). Figure 4 shows the relationship between phases. A total of 1.2 million gates emulated the processor and system ASICs. The Enterprise Hardware emulator from Quickturn Design Systems Inc. (Mountain View, CA) was used. An automated in-house flow mapped the UltraSPARC-I design's custom library cells to Quickturn emulation primitives. MEM cards implemented large memory arrays: caches, register files, etc. Over 3,000 probes--distributed over five built-in logic analyzers, one per Enterprise box--and three DAS systems, from Tektronix Inc. (Beaverton, OR), analyzed failures instead of traditional debugging tools.

The Tektronix DAS's sophisticated trigger programming capability helped track instruction flow through the pipe and other critical pipe events. Unix booted three weeks prior to tapeout with one instance of a discrepancy between UltraSPARC-I RTL and gate-level models uncovered during the process. It also found problems in the firmware/software for the new system.

Hardware emulation was useful for post-silicon debug and subsequent tapeout verification. A key advantage to emulation was that it provided a common focus to boot the operating system. As a supplement to verification, it was useful for post-silicon system debug.

UltraSPARC 64-bit microprocessor
UltraSPARC is a 64-bit SPARC V9 processor with four-way instruction dispatch, superscalar processing, and advanced multimedia capabilities. It has a tightly-coupled instruction prefetch and dispatch unit, integer execution unit (IEU), floating-point/graphics unit (FGU), load/store unit, and memory unit. It has external cache control and system interface logic on board. The chip was designed to maximize system efficiency and promote optimal throughput when executing complex, memory-intensive applications while maintaining full binary compatibility with all existing SPARC applications.

The microprocessor has 16 Kbytes of on-chip instruction cache and 16 Kbystes of on-chip data cache. It contains about 5.2 million transistors. The chip is fabricated on an advanced four-layer metal 0.5-µm process (see figure).

Other components complete the UltraSPARC chipset. A pair of data buffers connect UltraSPARC to the system and isolate second-level cache activity from the system bus. Each data buffer is a 70-kgate gate array containing logic and datapaths that allow overlapping of opertaions, resulting in shorter miss latencies and larger bandwidths to and from the system. The external cache is implemented using standard, priplined 1-Mbit SRAMs that are organized as 32 Kbits by 36 bits. Traditional with all of Sun's systems, the core logic of the system was implemented in ASICs. The ASICs totaled approximately 240 kgates of logic and 13 Kbits of SRAM.


UltraSPARC block diagram

Static timing analysis PEARL, from Cadence Design Systems, (San Jose, CA) was used for static timing. The tool traces paths to determine the minimum and maximum times a signal can change state; thus, no input stimulus was required.

Timing analysis identified maximum operating speed and range of operating conditions for which clocking hazards are avoided. In edge-triggered systems, the tasks are tied to two phenomena: zero- and double-clocking. Zero-clocking occurs when combinational logic in the design is too slow to produce a valid set of outputs within a given clock period. Hence, the phenomenon determines maximum operating speed. By contrast, double-clocking occurs when combinational logic is fast enough to produce more than one set of outputs for a given clock period. Besides the speed of combinational logic, clock skew and timing parameters of the state elements directly affect the two phenomena.

"False paths" are a problem inherent in static timing analysis. The paths are never exercised due to functional relationships between signals. Potential false paths were identified and reviewed by the design team. Monitors were placed in the functional simulation to verify that none of the diagnostic patterns ever exercised these false paths. While this methodology does not eliminate all false paths--the approach is conservative--at least all true paths are identified and verified.

The UltraSPARC-I project team met or exceeded all the project goals. The authors would like to acknowledge the effort of the UltraSPARC-I project team members in making the design a reality. The team developed a methodology which took into account the realities of deep submicron design and today's EDA tools. *

Shrenik Mehta is a hardware manager with Sun Microelectronics, a division of Sun Microsystems Inc. He was the validation manager for UltraSPARC-I and validation lead for the microSPARC-I and microSPARC-II processors at Sun. His current responsibilities include verification of UltraSPARC-I derivative products and evaluation of new verification tools and flows.

Robert Garner managed the UltraSPARC-I microprocessor project. He was responsible for the architecture, logic, and verification teams. Garner was a co-architect of the SPARC architecture and lead designer of Sun's first SPARC product, the Sun4/200. He is currently director of Java Media Processors at Sun Microelectronics.

Hemraj Hingarh is director of engineering at Sun Microelectronics. Since joining Sun in 1992, Hemraj has been responsible for design and development of the UltraSPARC microprocessor family.

Dennis Chen is a senior engineering manager at Sun Microelectronics, responsible for logic and verification teams of next-generation designs. He was the founding member of the UltraSPARC-I microprocessor design team, responsible for the load/store and memory management units as well as both the verification and hardware emulation groups.

David Greenhill is a circuit design manager at Sun Microelectronics. He joined Sun in 1992 to work on UltraSPARC. He has worked on circuit design, datapath methodology, timing analysis, and the memory management unit design.

Peter Fu is a Staff Engineer at Sun Microelectronics and was involved in the design and validation of the UltraSPARC-I processor. His current involvement includes high- and low-level testing through modeling, simulation, and emulation as well as the coordination of engineering changes and tapeout processes.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to michael@asic.com.

Back to the Top



integrated system design  June 1996



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]


For more information about isdmag.com e-mail marcello@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 - Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About