Countering claims made recently by an industry microprocessor research firm, Intel Corp. at this week's Intel Developers Forum in San Jose said the upcoming Pentium 4 has no deep pipeline performance penalty.
Intel executives here at IDF detailed the Pentium 4's NetBurst technology, which they said significantly increases performance over other processors, while nearly doubling the number of processor pipeline stages.
Jeff Austin, Intel's IA-32 architect launch manager, told EBN that the Pentium 4's 20-stage pipeline suffers no penalty for pre-fetch misprediction because of its use of the NetBurst technology. Misprediction, which sounds like an arcane technical question, is a key performance factor. To increase the speed of operations and data rates, modern processors literally try to guess in advance what data will be needed. If the processor guesses wrong, a deep 20-stage pipeline such as Pentium 4 can take up to 13 clock cycles to purge all the data and be refilled, slowing operations.
Bert McComas, an analyst at InQuest Research Inc., Gilbert, Ariz., claimed recently that the pre-fetch misprediction problem causes the 1.4-GHz Pentium 4 to operate at the same performance level as the 1.13-GHz Pentium III.
Intel's Austin, however, said NetBurst corrects most of the miprediction problem, with the Pentium 4 performing at the highest level of any Intel processor to date. Allowing the deep Pentium 4 pipeline to meet performance targets is only one of NetBurst's goals, as the device also aims to provide much faster integer and floating-point-instruction operations.
NetBurst includes Advanced Dynamic Execution, a speculative engine that helps increase memory pre-fetch prediction rates greatly, according to Intel. The technique uses three times as many instructions operating in pre-fetch as the Pentium III and includes more sophisticated algorithms that look at many prior executions before making a prediction on data to be accessed, Austin said.
The Pentium 4 also features a Level 1 on-chip cache that executes already decoded instructions, thus eliminating latency delays. The L1 cache of the Pentium III, in comparison, must decode instructions each time they are issued, slowing the speed at which data is fed to the processor.
NetBurst's Rapid Execution Engine is another feature and includes an arithemetic logic unit (ALU) integer-procesor running at 2.8 GHz, which is twice the main-processor clock speed and provides extremely rapid processing of integer instructions, according to Austin.
A new Streaming SIMD-2 Extension in NetBurst also speeds processing by operating arithemtic integer operations at 128 bits every clokc cycle, twice as fast as Penitum III. Additionally, Intel said, the NetBurst adds a 128-bit double precision float point operation not found in the Pentium III.