Mountain View, Calif. -- MIPS Technologies will introduce a next-generation core this week with a "virtual CPU" architecture that MIPS believes can forestall the need to move to multicore designs for multimedia gear and network applications.
The 90-nanometer, 500-MHz, 32-bit MIPS34K--essentially a superset, with DSP extensions, of the earlier MIPS24--uses what MIPS calls symmetric multithreading. The core incorporates several hardware virtual processing elements and an optional quality-of-service logic block for real-time deterministic operation.
According to Vivek Sardana, MIPS34K product-marketing manager, the combination should enable up to a twofold performance improvement over the MIPS24K in embedded consumer apps requiring a mix of DSP and RISC operations and the use of more than one operating system.
In internal tests, the new core, running several EEMPC benchmarks in parallel, ran 60 percent faster than an earlier, 625-MHz core running the benchmarks sequentially, said MIPS34K engineering director Darren Jones. That speedup was achieved with just two threads and little impact on the caches. The cost in silicon was only 14 percent of the 72-square-millimeter die.
Kenton Williston, DSP analyst at Berkeley Design Technology Inc. (BDTI), said the MIPS design minimizes the effect of the one major bottleneck plaguing almost every microprocessor design: the inherent inefficiency of the pipeline with regard to thread misses caused by memory latencies, pipeline stalls and other factors.
"A fact of life for most microprocessor architects is that instructions are not normally issued for each and every cycle," said John Carbone, vice president of marketing at RTOS vendor ExpressLogic. "In the real world, a lot of time is wasted on cycles that execute with no data available because a cache line is loading or the CPU is fixing a cache miss."
To recapture those missed cycles and do so with a minimum of extra silicon, the MIPS multithreading architecture maintains multiple contexts in hardware so that when there is a missed cycle, the processor can switch to another context and leverage the empty slot in the processor pipeline.
The MIPS34K core uses two virtual processing elements (VPE0 and VPE1), containing a total of five thread-context (TC) blocks. Jones described a VPE as an instantiation of the OS-visible state of the MIPS32 architecture and a TC as a replication in hardware of MIPS32's user-state application programming model. "To the application and the OSes, each VPE or TC looks like a fully featured CPU, which allows us to run multiple OSes, processes and threads concurrently," Jones said. And since the VPEs share a cache, the multithreading design is inherently cache-coherent.
"The key advantage to the 34K is that it provides hardware to reduce the cost of switching between tasks to essentially zero, not counting the costs of any cache misses that might occur," said Williston of BDTI.
The VPE/TC structure also seems able to reduce cache misses. "Generally, a good half of the cycles in almost any microprocessor design are lost to inefficiencies in the pipeline or to memory access latencies," said Carbone. "If the MIPS VPE/TC structures can capture a good portion of those wasted cycles, you double the performance of the processor, with no additional cores, pipelines or higher clock rates, and at considerably lower power consumption than other approaches."
Williston does not believe that the MIPS34K will be the right solution for all multimedia- and network-intensive apps, however. "In some cases the extra complexity won't be worth the performance gain; in others, you won't get much of a performance gain. A key question is how easy it will be to use the multithreading virtual processing element architecture."
Express Logic's Carbone has questions about the approach's extensibility: Can the number of TCs and VPEs be increased without substantially increasing the die area of the core? And can the approach benefit designs that incorporate multiple VPE/TC-enabled 34K cores?