|
Intel varies VLIW for IA-64
SAN JOSE, Calif. -- Forging a new direction for VLIW technology, Intel Corp. tomorrow will lift the lid off its IA-64 instruction-set architecture in a presentation at the Microprocessor Forum, here. But some experts believe Intel will also open up a Pandora's box of technical challenges as it works to implement the instruction set in its upcoming 64-bit Merced microprocessor. Toughest will be extracting maximum performance from IA-64 while ensuring that Merced stays compatible with existing X86 software. In addition, Intel's engineers will face tight design constraints as they attempt to support numerous, complex IA-64 instructions while maintaining the shortest critical paths in silicon. To date, Intel has revealed almost nothing about IA-64 other than to trumpet it as the first "post-RISC" architecture. Moreover, Intel firmly insists it has not implemented a very-long-instruction-word (VLIW) approach--something that's been widely assumed. "It's going beyond VLIW--it's a new technology," said an Intel spokeswoman. "There are no VLIW concepts in IA-64." Nevertheless, sources familiar with IA-64 report that it does in fact draw heavily on the tenets of VLIW to take maximum advantage of instruction-level parallelism and advanced instruction pre-decoding techniques. "It's a very different kind of instruction set," said a source close to Intel who requested anonymity. "It smells a lot like a VLIW architecture, but it's dynamic. The compiler suggests instructions that can be put in buckets--that can be done in parallel, but don't have to be--by the execution architecture." According to the source, IA-64 in some ways calls to mind the groundbreaking C6X digital signal processor introduced by Texas Instruments Inc. earlier this year. "The Texas Instruments DSP dynamically fills in instructions, using what looks like a VLIW architecture," said the source. "That's also the approach Intel has taken." (TI will unveil a new floating-point version of C6X at the Microprocessor Forum.) The hallmark of VLIW is the ability to execute numerous instructions simultaneously; the TI architecture can handle up to eight per clock cycle. That sounds impressive, but to achieve it, a smart compiler must find enough parallelism in the applications software to dole out those eight instructions for execution. That's tough, since there are often tricky interdependencies between instructions and data; for example, one instruction might be reading data from memory while a second is waiting for that same data. Like TI's C6X, most leading-edge architectures seek to take maximum advantage of parallelism. The latest incarnation of Digital Equipment Corp.'s Alpha architecture--the 21264 CPU--packs four independent integer units, which let it execute up to six instructions per cycle. Parallelism also pumps up the IA-64. Still, Intel is going to great lengths to ensure that the term "VLIW" isn't used in conjunction with its new architecture. "For marketing reasons and for technical reasons, Intel doesn't want to be associated with the failed VLIW implementations of the late '80s," said Linley Gwennap, editor-in-chief of the Microprocessor Report newsletter, which is hosting the forum. During that period, several VLIW efforts--most notably, now-defunct MultiFlow Computer Inc.--briefly flourished and then sputtered. Interestingly, Joseph Fisher, the key architect at MultiFlow, is a member of the Hewlett-Packard Co. team that has worked jointly with Intel to develop the 64-bit instruction set. "Intel is going to release a new term [for IA-64] at the Microprocessor Forum that they're going to use--not RISC, not VLIW, but something else," said Gwennap, who declined to reveal it. Gwennap sees IA-64 as taking advantage of VLIW concepts, but in a very different way from the pure VLIW approaches of a decade ago. "There have been some recent chips announced using VLIW, like the Trimedia [multimedia processor] from Philips, which have taken VLIW of the 1980s and moved it into the '90s," he said. "That's certainly the direction Intel is going with IA-64, but I think they go beyond what Trimedia does as well." Indeed, for a general-purpose architecture, IA-64 will sport an unusually rich complement of multimedia instructions. These will include 64-bit analogs of the existing MMX instruction-set extensions, the next-generation MMX2 multimedia instructions coming to the 32-bit X86 architecture next year (see Sept. 8, page 1), as well as many entirely new instructions. Intel is also equipping Merced with heavy-duty floating-point performance, the source close to Intel said. That's long been a perceived weakness of Intel chips as compared with their RISC competitors, and should enable Merced to stake a claim to computationally intensive applications in areas like scientific visualization and electronic-design automation. However, at least one Intel competitor thinks that IA-64 may be more sizzle than steak. "It seems that Intel is getting a lot of marketing mileage, but I don't see them doing anything revolutionary," said Paul Rubinfeld, a microprocessor architect who has led the development of several Alpha CPUs at Digital. "To me, there's more than technology involved. They're going to have all kinds of compatibility issues." In taking its new instruction set into the real world, Intel engineers will have to work hard to ensure that Merced delivers maximum performance with IA-64 while staying compatible with existing X86 software. Indeed, Intel has publicly stated that an imperative for Merced is "object-code compatibility," so that the new processor can execute the X86 instructions. But it's not clear how this will be done.
Legacy conversion "What you're left with is a decision on how to ensure X86 compatibility without impacting the performance of the non-X86 engine," said Rubinfeld. "As soon as you put stuff in hardware, you impact performance big time. The question is, are they going to impact the native [IA-64] machine to ensure a higher degree of compatibility, or are they going to do it in software? I'm not sure what they're planning." Along with the technical implications are marketing factors. "Intel has clone vendors breathing down their necks that would love to see them impact X86 performance," said Rubinfeld. Though noting that Intel has pledged compatibility, Gwennap of the Microprocessor Report, said the company won't address the subject in its IA-64 presentation this week. "The question is, will they do some sort of conversion of X86 instructions into native instructions and, if so, how efficient is it going to be?" he asked. "Certainly, the performance in X86 mode is not going to be as good as the performance in native mode. But I think if it's done right, they could get performance that's somewhere around one-third to one-half that of native performance."
Critical paths "A well-known critical path on a chip is a 64-bit arithmetic logic unit," said Digital's Rubinfeld. "In designing Alpha, we tuned it and then we said that we weren't going to add any instructions to the instruction set that would negatively impact cycle time." But Rubinfeld believes Intel may not have the same freedom because of legacy issues. "Their critical loops are there because they're in the X86. If you look at how many gates are involved in implementing a cycle, you'll see that X86 has a lot [of] gate delay." Gwennap sees some truth in the legacy factor. "For example, even though the Pentium Pro is designed for higher speeds than earlier Intel processors, it still doesn't match the speed of some of the faster RISC processors. There are some issues inherent in the X86 design, such as the complex address calculations that are involved, which the Pentium Pro has to do in a single clock cycle." One stumbling block is dealing with misaligned accesses to the cache, an issue that led Intel to put an extra multiplexer in the cache-access line. "It's little things here and there," Gwennap said. "Any one of them isn't going to take you from 600 MHz down to 300 MHz, but taken together it's sort of death by a thousand cuts." Experts say such problems may not be an issue for IA-64, since Intel will be starting with a clean sheet of paper. However, the historical baggage of the Intel architecture may play a role in clock speed, an arena where Intel CPUs have played leapfrog with their RISC competitors--and often lost. For example, Sun Microsystems Inc. recently announced a planned 600-MHz UltraSparc-III. Production versions of Intel's Pentium II top out at 300 MHz. (In a technology demo, Intel has previewed a Pentium Pro running at 400 MHz.) Another factor that can drag down performance is delays introduced by the individual gates. "For any given task, you get a gate delay per logic level," said Rubinfeld. "Intel has very good process technology and very good gate-delay times. This raises the question of why it doesn't have better clock speeds. The answer is, because they have a lot more levels of logic and hence more delays because they have to get a lot more work done." According to Gwennap, "Clock speed is only half of the performance equation. There's also the issue of how much work you get done per clock cycle. Inherently, the IA-64 architecture is designed for increased parallelism, and thus more performance per clock cycle. So whether or not Intel can match Digital's clock speed isn't really the point. The point is whether the overall performance is as good or better than Digital's."
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints| RSS|
Digital| Mobile |
| Network Websites |
|
International |
|
Network Features |
|
|
|
All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved. Privacy Statement | Terms of Service | About |