SUNNYVALE, Calif. Intel Corp. and Sun Microsystems Inc. are each laying plans to deliver multiprocessing computers on a chip. Both plan to put two to more processors on a die with simultaneous multithreading (SMT), a design approach that lets a processor handle two or more threads of an application simultaneously.
Intel is working on a unique implementation of SMT to address the memory access issues that crop up in multiprocessors. Sun will go further, designing one or more new Sparc processor cores that will be optimized for multiprocessing chips with four or more cores on a die.
IBM Corp. and a number of network processor companies have detailed multicore processors, but Intel's and Sun's plans mark the first time these mainstream computer companies have discussed such an approach for their high-volume markets.
Two trends drive the move to multiprocessing on a chip: engineers are finding it increasingly difficult to wring more performance out of single-processor approaches; and multicore architectures promise to simplify the increasingly complex design and validation work of processors, thereby lowering chip costs and slashing time-to-market.
Neither Intel nor Sun would specify when they will bring multicore processors to market, but Intel said it expects to deliver them at the 70-nanometer (0.07-micron) process generation, according to John Shen, director of microarchitecture research at Intel Labs. That product would be a follow-up to Intel's upcoming IA-32 Xeon processor, a single-core device that will use SMT to handle two application threads. Intel said in August that it plans to deliver dual-threaded Xeons in the first half of 2002.
Shen's research suggests that running two application threads per processor and applying a fresh twist on SMT may be the best approach for multiprocessor chips. "Is it two virtual processors running four threads or four processors running two threads? These are the kinds of tradeoffs we are looking at now," Shen said. "We're experimenting with this on a number of machines using Hyper Threading [Intel's name for SMT] here in the lab."
That work has led Intel to define something it calls speculative pre-computation, a form of SMT where a second thread attempts to guess the data required by the first and fetches it from memory.
"That's a very new idea," Shen said. "For applications with a lot of pointer chasing, it's better to run a pre-fetching thread than to run two [application] threads."
Shen thinks Intel will squeeze 10-to-30 percent more performance out of single-core Xeons using dual-threaded SMT. Bigger gains depend on applications being rewritten for multithreading, Shen said.
Side by side
For it's part, Sun's initial multicore device will be a dual-core UltraSparc 3 with SMT. That chip will target Sun's medium-performance servers and workstations used for general-purpose computing, as well as middle-tier servers used for applications processing.
"The first thing to do is to take one core and put another right next to it on the same die and then balance the memory system that speaks to that device," said Michael Splain, chief technologist in Sun's processor products group.
Separately, Sun plans to design one or more new Sparc cores geared for multiprocessor chips featuring more than two CPU cores. Those chips would be geared primarily for Web servers.
"I think RISC is just fine" for such cores, Splain said. "You will have to add some things for chip multiprocessing like coherency and reliability constraints. You have to add some features to the instruction set for multicore and error recovery. But it's an extended architecture, not a new one."
The aim of Sun's aggressive multiprocessor chip is to boost the raw throughput of Web servers. In such an environment, the raw speed of handling one inquiry on a site such as eBay or Google, for example, is not as critical as how many of those single transactions a Web server can process.
"There's a whole new class of apps on the Web that don't care about latency as much as throughput," Splain said.
Sun's multiprocessors are likely to strip out or scale back some Sparc features such as floating-point units while adding others. "You might want five times the bandwidth, bigger caches and better SpecWeb and Java performance," said Splain.
Academic improvements
Researchers and market watchers say the trend to multicore designs has been firmly established, especially among network processor makers and academics. Researchers at Stanford University, MIT and UC Berkeley are trying to put 16 or more simple computing elements on one die to greatly improve processor performance.
One of the secrets to multiprocessor success is creating "an assembly line on chip" where one die handles as much work as possible, said Bill Dally, a computer science and electrical engineering professor at Stanford. "The embedded processors have moved ahead on this front because they have fewer software restrictions [than computer processors]," Dally said.
Indeed, Intel's IXP 1200 network processor has six cores on a die, and the company's next-generation device is said to have as many as 16 cores, according to analyst Linley Gwennap of The Linley Group (Mountain View, Calif.). NPU startups such as EZChip Technologies and Internet Machines use as many as 64 cores on a die, Gwennap said.
Good memory access is key to squeezing more performance from such architectures. "If you have 100 cores, but don't address [chip-to-memory] bandwidth, you substitute one problem for another," Gwennap said.
Broadcom Corp. has taken a more modest approach to multiprocessing chips with its BCM1250 network processor, which uses two MIPS cores on a die. "We've tried to take a practical approach to what we can do for today's customers," said Dan Dobberpuhl, general manager of broadband processors at Broadcom. While more aggressive approaches at Stanford and MIT "are research projects," Dobberpuhl said, "we have to sell chips."
The company has just released a single-core version of its processor, but Dobberpuhl also sees potential in four-core versions using 0.13-micron process technology and six-core versions when 0.10-micron processes are available. Dobberpuhl is also evaluating the possible use of SMT in future Broadcom processors.
However, eight- and 16-core die may be impractical, he said. "General purpose software and the user base just isn't there for that level of parallelism," Dobberpuhl said.