News & Analysis

Pentium IV Pushes Clocks To 1.4 GHz and Beyond

Ray Weiss

8/31/2000 12:00 AM EDT

The Pentium still kicks butt. The Pentium IV, Intel's latest version built on the microburst architecture, is designed to deliver high-level performance for next generation clock rates. Interestingly, the Pentium IV does not represent a performance boost for the current Pentium III. In fact, performance levels are the same. But the Pentium IV deploys the Pentium architecture over higher resolution fabs with multi-Gigahertz clock rates. The first Pentium IV is due out next year running at 1.4 GHz, with higher clock rates to follow. A 2.0 GHz version is already up and running.

Higher resolution fabs and faster clocks require different chip design techniques for a processor architecture. Line delays, capacitance, and power all have to be carefully handled, especially for an architecture to be able to scale up on the next process generations and clock rates. The Pentium IV was designed to do just that. It is a complete redesign of the Pentium CPU, including an additional 144 Streaming SIMD Extension instructions (SSE2) for multimedia applications. SSE2 supports:

  • 128-bit floating-point
  • Video capture
  • Speech
  • Imaging
  • 3D graphics rendering.

The Pentium IV is designed for a 0.13 u CMOS process, not Intel's current 0.18 u process. Higher densities allow the use of a larger cache memory and larger intermediate cached instruction storage. For example, it has a 256 KB unified L2 cache and can store (or queue) up to 12 K micro operations, the mini RISC wide words (almost like microcode) that program the CPU logic. This will be a hefty chip with 42 M transistors, compared to 28 M for the earlier Pentium III.


Aiming For Clock Rate Performance
The Pentium IV implementation is tuned for high-clock rate execution. Design features include:

  • 400 MHz Front-Side Bus
  • 256 KB Unified L2 cache
  • 12 K cached decoded uOPs
  • Queued uOPS (instead of reservation stations) for execution units
  • Support for RDRAM (RamBus memory).

The new front-side bus (FSB) provides a higher bandwidth connection between the processor and the off-chip memory controller and main memory. This is a sophisticated split-transaction bus (commands are separated from the actual operation on the bus), pipelined for higher efficiencies. The FSB supports 128-byte line transfers with 64-byte accesses and can deliver a 3.2 Gbyte/sec transfer rate.

To accommodate higher clock rates, the Pentium IV was designed with a very long pipeline—20 stages as compared to Pentium's 10. The long pipeline allows designers to segment operations into multiple stages (right word?) and fit them into the narrow clock periods. To keep performance up with the longer pipeline, Intel engineers reworked the Pentium's branch prediction logic to get a higher hit rate. The branch hit rate is reputed to be in the low 90 percent range.

The key to the Pentium IV's high frequency performance is its Execution Trace Cache. Here, unlike most RISC processors, the Pentium IV caches decoded instructions, eliminating the complex decoding stages, except for the first pass through the thread.

The Pentium IV CPU gets more bang from its clocks by running the core frequency at twice the CPU frequency. This lets the Integer execution units execute at twice the main clock rate. The CPU can execute 4 integer operations per clock. The execution units also include a Load Unit, a Store Unit, an FPU Move/Store Unit, and an FPU MUL/ADD/MMX Unit.


Superscalar Operation
Like earlier Pentium's, the Pentim IV is a superscalar CPU and can issue multiple instructions per clock cycle, up to 3 uOps from the Trace Cache per clock. The uOps are decoded instructions that are then passed through the Rename/Alloccation logic, which maps the operations into available registers (the Pentium IV has 128 128-bit registers). It (the Pentium IV?) then assigns them to one or more execution units and passes them to the uOP Queues, where the operations are queued up waiting for the resources to be executed. During the next stage, the Schedulers schedule the operations for execution in the addressed execution units.

The CPU supports out-of-order, speculative execution. Instruction operations can be executed out of instruction order (if there is no resource conflicts or dependencies), and the logic will choose a path from a branch (speculate on the winning branch) for execution. If the branch choice is wrong, the logic can roll back the trace execution and "replay" by going down the correct execution path. This speculative execution is made possible by the Branch Target Buffer (4 K addresses), which buffers past execution choices, and the Trace Cache which caches the decoded instructions that were executed.

The Pentium IV supports very "deep" speculative execution. It has the resources to keep 126 instructions "in flight," i.e., in execution mode, three times more than does the older Pentiums based on the P6 microarchitecture. Of those instructions, the Pentium IV can juggle 48 loads and 24 stores.


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Product Parts Search

Enter part number or keyword
PartsSearch

FeedbackForm