SAN MATEO, Calif. The first IA-64 processors based on Intel's "McKinley" architecture can't come soon enough for Hewlett-Packard's system technology division, which has been refining a homegrown chip set that it hopes will spark immediate demand for high-end HP servers and workstations starting this year.
So far, however, demand for systems based on the Intel Itanium IA-64 processor line has been a major disappointment. HP estimates that 2,500 Itanium-based systems were sold, about half of them by HP.
High cost, unimpressive performance and frequent rollout delays were among the issues that kept these systems from reaching volume production.
But Intel Corp. and Hewlett-Packard Co. expect McKinley-based systems to do away with many of the problems of the first Itanium systems. McKinley features a spruced up microarchitecture and faster chip-to-chip I/O, which Intel promises will increase performance 50 percent to 100 percent.
Moreover, Intel and its partners have been testing the silicon for some time. HP, for example, was able to get an early version of the McKinley processor to run on its ZX1 chip set a year ago. That should lead to a smooth rollout when Intel starts shipping McKinley processors by midyear, as many expect, to replace the Merced.
If things go as planned, McKinley-based systems will quickly eclipse sales of the first Itanium. "I think we'll surpass [first-generation] Itanium sales in the first seven days," said Barry Crume, business manager for Itanium systems at HP. "The performance slope of Itanium and the IA-64 is doubling every year. It's such a steep slope that it can't be ignored."
McKinley's improvements will come from a 128-bit-wide system bus that runs three times faster than before, a large on-chip L3 cache, a higher core frequency of 1-GHz and additional issue ports and execution units.
But the processor is only as good as the platform on which it runs. For that reason HP has developed a proprietary chip set that aims to give high-performance users a reason to move immediately to IA-64.
Though HP wants to sell as many IA-64 servers and workstations as it can, the ZX1 doesn't pretend to be all things to all users. Features that would normally be found in a general-purpose chip set, such as memory mirroring or coherency for more than one front-side bus, have intentionally been left out to reduce the number of pins and logic gates needed to support them.
Tailored for performance
Rather, the chip set was designed to maximize performance at minimum cost, which should appeal to performance-hungry users like those in the scientific community. "People in the atmospheric research area demand a lot of new technology and don't mind if it's not pretty. One of the ways to get these systems running fast is to build your own infrastructure and that's what this chip set is about," Crume said.
From the start, the chip set was designed to run four processors in parallel, though it can also be scaled down to two-way and single-processor configurations. In this way, it differs from other four-way chip sets that were knockoffs of other products. "It doesn't look like you've sent a two-processor chip set up or an eight-processor chip set down to solve the problem," Crume said.
One of the chip set's hallmarks is the amount of main memory it enables. The chip set is designed to handle more than 4 gigabytes of double-data rate DRAM running at 266 MHz, and can theoretically take on as much as 256 gigabytes.
To enable memory subsystems in excess of 4 Mbytes for four-way systems, the ZX1 has a pair of memory expansion chips through a main I/O hub. Each channel to the memory expansion chips runs at 6.4 gigabytes/second.
The memory expander chip adds another 25 nanoseconds of access delay, but this latency is minuscule when considering the benefits of having more local memory. "For typical four-way systems, the performance limiter is the amount of memory capacity and having to go off to the disk," said Erin Handgen, senior chip set architect at HP.
The bandwidth between the I/O hub and each memory expansion chip is equivalent to the data transfer speed between the I/O hub and the processors. For systems that need fewer dual-in-line memory modules, such as those that require one or two processors, the I/O hub can interface directly to two banks of DRAM at 4.3 Gbytes/s.
The I/O hub also supports PCI, PCI-X and AGP buses. This is made possible through a series of I/O adapter devices distributed by the main I/O hub. Depending on the total bandwidth load, the I/O hub supports up to eight of these separate I/O adapter chips.
A four-way CPU system, for example, could have four adapter chips, each one supporting two lanes of PCI-X 66 running at 512-Mbytes/s, and two more adapter chips that each support a single lane of PCI-X 133 running at 1 Gbyte/s. The adapter I/O chips are also geared to support AGP 4x so that the systems can use commercially available graphics cards.
Crume said he expects 80 to 90 percent of the applications developed for HP's McKinley systems to take advantage of the 64-bit environment. Operating systems that have been optimized for IA-64, such as HP-UX, should run faster than those that haven't been optimized yet, such as Linux. One option is for 32-bit apps to use a "wrapper" so that they can operate more effectively in a 64-bit environment, he said.