United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



ASIC Technology

A 1GB/s Graphics Controller


The methodologies and trade-offs for a high-speed graphics controller.


By Billy Garrett and Thomas Chai


The Rambus Universal Graphics Back End (RUGBE) is defined as a technology demonstration chip. It contains many newly designed and untested high-performance macros and provides high-speed, high-resolution graphics control for a PC over the PCI local bus. Using a conventional ASIC design methodology, the RUGBE chip worked perfectly the first time. The following is a description of the chip and the design methodology used to create it.

The 132MBytes/sec bandwidth of the 32-bit PCI bus can become a bottleneck to high-end graphic displays. Workstation-quality displays that support both high-resolution and high- or true-color depth can use hundreds of megabytes per second in microprocessor bus bandwidth. For example, RUGBE can display 1600 x 1200 x 16 bits-per-pixel, which represents 4MBytes of frame buffer storage. This display requires moving more than 340MBytes/sec of data for refreshing the screen, not to mention overlay or other functions.

A major advantage of using a Rambus-based frame buffer is that it frees up many controller pins to be used for other functions, such as the expanded PCI bus. In this case, RUGBE contains a full 64-bit PCI interface and still has a 1,000MBytes/sec interface to memory. It is implemented in a mainstream, 208-pin, PQFP package. With a wider PCI interface, more bandwidth can be allocated to the graphics display, allowing the display to change faster and objects to move faster (see Figure 1 for the basic block diagram).

Tiling Tiling is an architectural concept designed to increase performance for graphics operations. Many graphics operations deal with horizontal scan lines. Traditional memory mapping puts the majority of "page mode" memory in consecutive pixel locations. This results in each "page" of memory being displayed on one or two lines on the screen. For some operations (like small characters, vertical lines, and small triangles), tiling produces an improvement of three to eight times over a non-tiled display.

In a tiled display, the memory in a page is arranged in a rectangular fashion. For most tiles used in RUGBE, the tile size is 256 x 8--a nice binary number. We have found that if the width of the tile is equal to the longest memory transaction (in this case, display reads), we obtain the best overall performance.

For 24-bit per pixel true color modes, we decided to change the size of the tile to 192 bytes by 10 scan lines. This size fits with only 128 bytes remaining. Therefore, the tile size is exactly 64 pixels by 10 lines. The remaining memory, along with non-displayed memory, is reclaimed for off-screen use.

Packing and tiling is especially beneficial for 1280 x 1024 x 24 bits-per-pixel. In this case, the display fits into 3.75MBytes of memory, leaving 256Kbytes of memory for off screen. This fits exactly into two Rambus DRAMs (one RDRAM per channel).

The only major drawback to this arrangement is that display refresh requests are now limited to 192-byte transfers, which is not quite as efficient as 256-byte transfers. However, this was the tile organization we chose for RUGBE.

Software drivers Early on, we knew that we would need software drivers to make RUGBE an effective demonstration. We decided to write a driver for Windows 95, which made writing the driver simple, since it offers native support for linear frame buffers.

Figure 1. This block diagram shows the major building blocks within the RUGBE chip.

We took the Microsoft DDK (device driver kit) and modified it for RUGBE. This involved setting pointers to where the frame buffer was located in memory, writing routines to set the hardware cursor, writing routines to access the color look-up table digital-to-analog converter (CLUTDAC), and writing routines to initialize the RUGBE chip and RDRAMs. The remaining graphics primitives were implemented by Windows 95 directly.

In addition, we had to write a VXD module, which is used through the operating system to return a virtual address of the RUGBE frame buffer. Although the physical address of the RUGBE chip is known, the operating system maps the board, based on its position in the PCI bus (along with other system variables), to a specific virtual address, which changes from system to system. Although the VXD module is small, it was essential for the driver to work properly.

The lack of integrated VGA was both a help and a hindrance to the project. Although NEC has a VGA core in its coreware library, due to the design time-constraints, we decided not to put the VGA core in the RUGBE chip. By using a standard VGA, we eliminated the need to write a BIOS for our demo system, greatly simplifying our software task.

Hardware design We started our design of the 64-bit PCI interface by using a RAVIcad model as a test bench. We designed our own state machines, making modifications for 64-bit bus support. We then used the RAVIcad model to check our design.

We discovered two basic errors. First, a couple of the lines are sustained tri-state (STS), and we forgot to drive them high (missed adding a flip-flop). The RAVIcad model, which worked in simulation, missed this because it uses pull-up resistors. Second, we never tested with the presence of another PCI card in our simulation. As it turned out, our design inadvertently looked at burst data and, if the address matched, it started a transaction. This was not supposed to happen. We found both of these errors when we debugged the board. These problems were easy to fix inside the chip, and we did it externally with a single PAL.

The analog part of the design was broken up into three sections: PLL (phase-lock loop) design, high-speed dual port SRAM design, and 9-bit DAC (digital-to-analog converter) design. The SRAM was actually a 10-bit version--in case future CLUTDAC requirements needed 10-bit resolution. The DAC itself was designed to 9-bit resolution, allowing slightly better precision for use in gamma corrected displays. The 170MHz video DAC and 170MHz PLL macros used in the RUGBE chip were third-generation, CBIC analog macros developed by NEC Electronics. Both macros were implemented in NEC's 3LM, 0.5µm CBC8 technology.

For the analog macro design, schematics were captured using Cadence's Analog Artist, and they were then simulated using Meta Software's HSPICE simulator. We also used Cadence tools for the layout.

An environment consisting of both Cadence's Verilog and HSPICE was used for mixed-mode simulation. All top-level designs were done using Cadence's Verilog HDL design language. They were then converted to a CBC8 gate-level netlist using Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys ' HDL compiler. Back-end design was done using NEC's mixed A/D floorplanning tools and Cadence's Cell3 CBIC place and route CAD tool.

Mixed A/D ASIC design used the Verilog logic simulator for functional as well as timing simulation. All analog macros were written as a logic functional model or a behavioral model. In addition, many digital test modes of the RUGBE logic were added to make sure that any problems we might have would be easy to detect and fix. As it turned out, these modes were invaluable in debugging.

Finally, we used Cadence's Dracula for the layout verification and TimeMill, from EPIC Design Systems, for whole chip-level timing verification. NEC developed an internal CAD tool to automatically insert VDD/GND power rings to shield the analog macro from digital logic. In addition, NEC used Cadence's Cell3 CBIC place and route tool for mixed A/D chip back-end design. Figure 2 shows the design flow.

The on-chip pixel clock generator macro used in the RUGBE chip consists of PLL; analog hard macro; associated M, N, P counters; and digital soft macros. The input reference frequency is usually 14.31818MHz--a common TV, 4-times colorburst frequency crystal. By changing the values of M, N, and P, an appropriate pixel clock will be generated. Since the pixel clock exists only internal to the RUGBE chip, the high-frequency signal is not routed on the board. The counters are laid-out using Cadence's Cell3 CBIC P&R tool in NEC's CBC8 primitive macros.

In addition to the other analog macro cells developed for RUGBE, we needed to develop a Rambus ASIC cell (RAC) in parallel for the CB-C8 process. The RAC is the interface to the high-speed Rambus channel. It consists of two DLLs, drivers, receivers, and additional circuitry to convert the narrow Rambus Channel into a 64-bit wide internal data bus, running at 66MHz.

To be compatible with automatic place and route CAD tools, and to reduce noise coupling from digital circuit to analog circuit, NEC's approach to the mixed-analog/digital chip was to place all high-performance and noise-sensitive analog macros in the chip I/O boundary area. In this arrangement, we could easily shield all surrounding digital logic from the high-performance and noise-sensitive analog macros (see Figure 3 ).

Figure 2. The RUGBE design flow follows NEC's standard CBIC mixed A/D chip design flow.

In the final layout, the two Rambus channels are on opposite sides of the die. The RAMDAC is on the third side, and the 64-bit PCI bus wraps around the remaining pins of the RUGBE chip. The order of the PCI signals matches the layout of the signals on a board. The Rambus channel signals are designed to be laid-out as straight traces from the controller to the end of the channel (where they are terminated).

The remaining area was used for the random logic functions. Because of the odd shape of some of the FIFOs, the layout resulted in less than optimum routing. But schedule pressure did not permit any time for interation of the design, so we went ahead and built the part-- even though it could have been significantly reduced in size.

NEC used a two-pass test method to fully test the mixed A/D ASIC. Digital testing was done using an Ando digital tester, and analog macro testing was done using a Hewlett-Packard analog tester. During testing for digital logic, all digital signals had to be propagated to the primary I/O pins, including digital signal connects between digital blocks and analog blocks. All embedded RAM had to be able to mux to the primary I/O pins in order to facilitate testing. During testing for analog macros, all analog macros under test must also be configured as independent blocks (de-coupled from digital logic). This usually requires that input or output pins be multiplexed in test modes to bring only the signals of the individual analog macro (under test) out to the tester pins.

Figure 3. In the chip plot above, the PLL analog hard macro shows the layout as an I/O cell connecting to bond pads.

Check-out Once the chip was back from the fab, we spent a few days in the lab to get our first video output. Our debugging strategy was to generate as much good news from the lab as quickly as possible. This meant fixing and getting around problems and then going back to fully understand them. Initial problems involved everything from PCI BIOS and configuration problems, finding the virtual address of the card, and debugging our software in parallel with the hardware debug.

Primarily, we used the HP16500B Logic Analysis system and the HP16517A 4GHz Logic Analyzer add-in board. This board provides 16 channels that can run phase-locked to a system at up to 2GHz. Since the Rambus channel consists of only 11 signals plus clock, this board is ideal for checking out the operation of the channel. It also provides an invaluable service through its over-sampling capability.

Since the Rambus channel runs at twice the rate of the clock, by selecting 2x over-sampling, all of the data can be collected. On the down side, the trigger mechanism works on one edge of the clock only, making it difficult to trigger on events that occur on over-sampled points. Also, in this mode the device is limited to four trigger levels, which makes triggering difficult.

Figure 4. Key features of the RUGBE chip plots. The RAC cells are located on the opposite side of the chip, the video DAC and PLL analog macros are located on one side and next to each other, the LUT RAMs are located close to the video DAC macro, and the eight display FIFOs are grouped together.

Within a week we were at the point of drawing rectangles on the screen. From there, a basic Windows 95 driver was up and was running about a week later. Once the driver was up and fairly stable, we did more work on the hardware to check out various display modes and performance.

Overall, the entire debugging process took about one month from the time the chips came back until we were ready to start showing demos externally.

During the back-end design of the RUGBE chip, we experienced some major problems:

* NEC used fixed-pad rings for the CBIC ASIC design. Due to this constraint, RAC, DAC, and PLL macros did not pad-pitch match with the fixed-pad ring data. We had to push RAC, DAC, and PLL macros further into the core area to reserve sufficient die area to connect the macro pins to the bond pads.

* Due to 64-bit data bus width, NEC's compiled FIFO macro was too wide and blocked too many metal-2 routing tracks.As a result of theseproblems,and schedule considerations, the RUGBE die size was at least one to two millimeters larger than necessary. RUGBE is currently being ported to the new VX process and is being used as base in other designs in which this area penalty will be reclaimed. *

RUGBE specifications
The Rambus Universal Graphics Back End (RUGBE) technology was jointly designed by engineers at NEC Electronics and Rambus, Inc. RUGBE uses the CB-C8, 0.5µm process technology, addressing display resolutions of up to 1600 x 1200 x 16 bits-per-pixel (bpp) or 1280 x 1024 with 24bpp true color support. The RUGBE ASIC and two Rambus DRAMs support frame buffer bandwidths of up to one gigabyte per second (1000Mbytes per second). RUGBE supports both tiled and non-tiled displays.

The RUGBE chip supports all standard pixel formats: 8, 16, 24, and 32bpp. RUGBE's address control unit contains the logic to pack 24-bit pixels efficiently into memory, effectively storing a pixel every three bytes. This allows 1280 x 1024 x 24 to fit well within the 4 MBytes of memory required to hold this display.

The RUGBE component combines a 64-bit PCI interface, dual Rambus channels, and an integrated color look-up-table, digital-to-analog converter (CLUTDAC). It is the first graphics component to combine these functions within a single, low-cost, 208-pin plastic package while delivering one gigabyte-per-second bandwidth to memory.

RUGBE does not contain any hardware acceleration features such as BitBlit engines, line drawing engines, or other pixel-drawing hardware. However, the existing circuitry uses only about 25K gates. It can be provided in HDL form to system designers who want to customize RUGBE for a particular application.

RUGBE contains circuitry for a hardware cursor. This feature provides improved quality to the display because the cursor is always visible--even during display update. RUGBE can support up to a 64 x 64 cursor size, which is easily visible--even on a 1600 x 1200 display.

Thomas Chai is the senior manager of Research and Development at NEC Electronics, Inc. (Mountain View, CA).

Billy Garrett is the manager of Graphics Development at Rambus, Inc. (Mountain View, CA).

This article is an excerpt from a presentation the authors made at Design SuperCon '96.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to michael@asic.com


integrated system design  March 1996



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]


For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 - Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About