SAN JOSE ( ChipWire) Motorola Inc. has rolled a floating-point unit and other significant enhancements into the synthesizable Coldfire version 4e processor core, creating a modular design that delivers up to 350 Dhrystone millions of instructions per second (MIPS) in a 0.18-micron process.
Motorola is gradually weaning its customers away from the venerable 68000 family, which continues to ship in very high volumes. New designs are accomplished with the "more efficient" Coldfire cores, which run the 68000 instruction set and which can be designed within Motorola's SoCDT (system-on-chip design technology) methodology, said Wendell Smith, a senior manager in the company's standard embedded solutions group.
At last week's Microprocessor Forum here, Joe Circello, chief architect of the Coldfire processor, described the Coldfire 4e as a modular design that will offer a floating-point unit, previously lacking in the Coldfire architecture.
"One thing we have been criticized for is floating point, with Motorola saying 'if you want strong floating point, go to a 68000 product.' That is no longer true. With the 4e, customers get all the functions of the 68000 family and the enhanced MMU (memory management unit), floating point and MAC cores," Circello said.
The raw performance of the Coldfire cores is improving as Motorola brings its older Coldfire cores onto its 180-nm micron process technology. The 4e will ship in the second quarter of 2001 in designs running at 225 MHz, delivering 350 Dhrystone 2.1 MIPS. Motorola brings on its 0.13-micron process technology starting in July 2001, which will set the stage for the Coldfire 4e core to run at 333 MHz, delivering 500 MIPS. The 4e core was built with 160,000 gates and consumes about 4 square millimeters in the 180-nm (0.18-micron) process, which downsizes to 2.1 mm2 in the 130-mm (0.13-micron) process.
Two years ago, the Coldfire v4 core introduced a Harvard architecture to the Coldfire family, and the 4e core also features independent, decoupled pipelines. The instruction fetch pipeline (IFP) includes four stages fed by an instruction cache, while operand execution pipeline (OEP) includes five stages. Motorola claims that the Coldfire 4e core offers the most instructions per cycle in the industry, with a cycles per instruction (CPI) of 1.35.
With many embedded applications dealing with streaming multimedia data forms, floating point and an enhanced MAC unit were clear options.
"The availability of a floating-point unit means that, as processors move above a certain performance point, suddenly design engineers can change how they view existing problems. And with high level languages that support FPUs, engineers can address those problems in ways that are affordable," Circello said.
The FPU is a 64-bit double-precision implementation of the MC68060 floating-point instruction set, and is IEEE-754 standard compliant with a software assist. The FPU module supports concurrent execution between the operand execution pipeline on the 4e core and and FPU.
The operand formats can be byte, word, or long-word integer, in either single or double precision, though all internal calculations are done in double-precision arithmetic.
The FPU requires about 80,000 gates. Motorola claims the performance advantages are considerable, with image processing improved by 1.4 to 1.9 times, depending on the size and nature of the image being processed.
The Coldfire architecture has included a MAC module for several years, and the redesigned MAC for the version 4e is more efficient, particularly for complex fast Fourier Transform (FFT) algorithms. More Coldfire-based designs are handling DSP functions, and the enhanced MAC (Coldfire eMAC)unit is expected to be used on many of the designs based on the 4e processor core, Smith said. At 22,000 gates, "the MAC is fairly small and so many customers go ahead and ask for it," he said.
Motorola claims that Coldfire 4e designs with the eMAC will outperform ARM 9E-based solutions. "Taken in total, performance is approximately double that of the 9E," Circello claimed.
The eMAC is based on a four-stage execution pipeline that is optimized for 32 by 32 multiply-accumulate operations. The accumulator results are stored in the integer register file.
The design supports an expanded programming model with load/store/copy accumulator instructions and programmable control of saturation arithmetic.
The Coldfire 4e also supports dual-ported RAMs which support DMA (direct memory access) transfers directly into RAM, optimized for double buffer schemes.
Enhanced for multiprocessing
For system-on-chip designs that include the MPU, FPU, and other cores, the Coldfire architecture has been enhanced to support basic multiprocessing requirements. CPU run and halt control, interrupt steering, debug control and software control of memory coherency are supported.
The memory management unit will enable designers to better serve networking applications. "Three or four years ago there was no need for this, but with the increasing number of real-time operating systems, we have to support better isolation among the processes," Circello said.
The version 4e memory management unit handles address translation inside the core complex, process partitioning and expanded debug capabilities. It works with the dual 32-entry, fully associative TLB (table lookup buffers) that are supported in the Coldfire's Harvard architecture.