SAN MATEO, Calif. Nintendo Co. Ltd. and its partners unveiled more details about the long-awaited GameCube Wednesday (May 16), the successor to the company's Nintendo 64 that will go head to head with Sony's Playstation and Microsoft's X-box.
Designed exclusively for game play, GameCube will be launched in North America on November 5, three days ahead of Microsoft's expected launch of the X-box.
While the processing power of the hardware has lots of sex appeal, Nintendo and its partners say that was not their first objective. Rather, when the project was started three years ago, Nintendo urged its silicon partners to keep in mind where the money is the games themselves.
"What we were tried to do was create a games console that was powerful enough and easy to develop games for," said Mike West, multimedia architect for IBM Microelectronics, which is providing a special PowerPC processor for the GameCube. "For that, we had to get the hardware out of the way of game developers' creativity."
Not surprisingly, the directive followed complaints from developers about hardware bottlenecks and problems getting images to interact on its previous game platform. "With Nintendo 64 there was some difficulty with the programming and getting the performance from a basic architecture point of view," said Greg Buchner, vice president of engineering for the original ArtX team (now owned by ATI Technologies Inc.) that designed the graphics and I/O chip for GameCube.
With these broad design goals in place, the companies worked closely to create a tightly woven system with fast data movement between the components and internal functional units. To this end, they tried to keep memory latencies low and predictable while maximizing the bandwidth available between them.
But you won't see Nintendo and its partners touting extreme processor speeds or ultrahigh polygon-crunching power. The main processor is based on an existing PowerPC RISC processor, with some tweaks to the caches and floating point unit. The graphics processor churns out between 6 and 12 million polygons per second, far lower than the purported 75 million peak of the Playstation 2.
The chip companies make no apologies. "It's more a data flow problem than a megahertz problem," Buchner said. "There's a peak number and what you actually achieve. Six to 12 million (polygons per second) is a number achievable by mere mortals. In the end what matters is how the content looks."
Even so, Nintendo and its partners did employ some brute force to maximize performance. One of the most conspicuous features of the Flipper graphics processor was its liberal use of on-chip memory to ensure low-latency access to memory. The 162-MHz Flipper contains 3 Mbytes of on-chip RAM, divided between 16 Mbits of frame and z-buffer memory and 8 Mbits for textures. All told, about half of the chip's 51 million transistors are just for RAM.
While the use of embedded RAM is not unusual for graphics controllers, ATI opted to use the single-transistor SRAM from Mosys Inc. instead of conventional DRAM. The 1T-SRAM includes a capacitor as DRAM does but is divided into multiple banks, to divide up the rows and columns for faster accesses. While 1T-SRAM is less dense than DRAM, its density is better than traditional six-transistor SRAM while attaining SRAM-like access speeds. Effective memory capacity was also increased by compressing the data that sits in memory.
"With any DRAM structure from a connection point of view, you end up with lots of addressing restrictions and a whole bunch of rules on when you can and can't get to the memory," Buchner said. "With 1T-SRAM we get something for which those restrictions are hidden."
Both internal memory buffers have a sustained latency of under 5 nanoseconds. The frame and z-buffer memory is capable of 9.6 Gbytes/second of bandwidth. The texture buffer boasts an even faster bandwidth of 12.8 Gbytes/s because it's divided into 32 independent macros, each 16 bits wide for a total I/O of 512 bits. This gives each macro its own address bus, so that all 32 macros can be accessed simultaneously, said Mark-Eric Jones, vice president of marketing for Mosys.
Another difference between GameCube and Playstation 2 is that the graphics device has a direct link to main memory, not the CPU. The idea was to keep the graphics processor crunching as many bits as it could continuously. "Graphics is fundamentally a memory bandwidth problem. The CPU is the next largest consumer of memory," Buchner said.
Here again the GameCube uses 1T-SRAM, this time as 24 Mbytes of external memory. Operating at a 405-MHz clock speed, the memory moves data at 3.2 Gbytes/s, with a sustained latency of 10 ns. But unlike the Playstation 2 or the forthcoming XBox, GameCube's memory subsystem does not rely on a Rambus or Double-Data Rate (DDR) interface to boost the bandwidth. Instead, Mosys developed a proprietary active termination I/O that resides near the pads and eliminates the need for placing a bank of resistors on the board, saving area and cost.
The graphics processor engine itself was designed to perform 95 percent of the 3-D rendering tasks, including lighting and geometry. Buchner declined to provide further details on the graphics engine.
"What we've done is to move into dedicated silicon the things that are repetitive and clean," Buchner said. "We've made it so that it offloads the CPU for mundane tasks."
The graphics controller also acts as the I/O hub, with interfaces to dedicated external DRAM for audio, flash cards, serial controllers and the Gekko CPU. Based on 0.18-micron process technology, the graphics and I/O device is being produced by NEC Corp.
The Gekko chip talks to the Flipper controller in the same way that an x86 processor interfaces to a north bridge. It's this device that is supposed to help developers work their magic, processing special instructions for artificial intelligence and physics features that help programmers set their games apart. "It's where the personality of the game is coming from," Buchner said.
Unlike Sony's Playstation 2 Emotion Engine, the Gekko MPU was not built from the ground up. It's a derivative of the PowerPC 750 RISC processor and includes some 50 new instructions. Based on 0.18-micron copper wire process technology, the device runs at 405 MHz and has an external bus to the Flipper device with a peak of 1.6 Gbytes/s. The chip has a performance rating of 925 DMips (Dhrystone 2.1).
The GameCube team chose to build off the existing PowerPC design to leverage the available tool chain, such as compilers and optimizers. IBM claims this has given developers a jump on creating new games. "Developers have been making software for the GameCube a long time before people knew we were doing the silicon," said IBM's West. "If you were to take code written for a PowerPC you could essentially run it on this device. We didn't deviate from what is a well-understood architecture by a large amount."
One of the modifications it made was to cut the 64-bit floating point unit in half, allowing it to do two 32-bit floating point operations every cycle. "Conventional wisdom is that four-way is actually better, but this is not necessarily true," West said. "Two-way is actually pretty much as powerful as four-way, plus it takes up less silicon and it's easier to make it go fast. We're going to try to complete two instructions every cycle."
To improve the internal data flow, IBM tried to eliminate "cache trashing," or wasting cache space on transient data. The 256-Kbit Level-2 cache can be locked down so that it retains only the data that needs to be reused. There's also an internal direct memory access that moves data from the cache while allowing the device to process a different set of data. This mechanism helps mitigate the incremental latency associated with compressing and decompressing the data.
"You often get into a mode of cache trashing and filling it up with useless data," said PowerPC architect Peter Sandon. "We tried to optimize the data movement so that we don't see the cache misses you would otherwise see."
By emphasizing low latency over raw performance, Nintendo hopes to give game developers some respite from the costs associated with getting the most out the hardware.
"The price of developing a game is not quite as bad as making a movie, but there's no doubt it's getting costly," Buchner said. "This is going to allow developers to focus on the artificial intelligence part of the game, character interactions and game play rather than trying to get basic performance out of the box."