Recently, NXP unveiled more details on its new licensable core, the CoolFlux BSP, which targets low-power communications baseband processing.
The core is based on the similarly named CoolFlux DSP, which was designed for use in low-power audio applications and introduced in 2004. Relative to the older core, NXP says that the CoolFlux BSP has been enhanced to increase its performance in baseband processing while retaining a small footprint and low power.
According to NXP, the CoolFlux BSP core will run at 290 MHz in a 65-nm process and consume about 65K gates and 20 mW at 1.2 volts. It is designed to be used as a co-processor or in standalone mode, and can also be used as part of a multi-core system.
The core uses a 24-bit data path, which is somewhat unusual for a baseband processor; most competitors (such as the Tensilica Xtensa LX, Ceva XC, and the Ceva TeakLite families) use either 16- or 32-bit data paths; those with 32-bit data paths often split their data paths to perform dual 16-bit operations.
The CoolFlux BSP's 24-bit width is a holdover from the CoolFlux DSP: audio-oriented processors often use 24-bit data paths as a compromise as 24-bit data eases implementation of high-fidelity audio algorithms relative to a 16-bit machine, and reduces area and power relative to a 32-bit design. As described below, the CoolFlux BSP can split its 24-bit data path into two 12-bit paths to increase computational throughput.
As shown in the Figure below, the CoolFlux BSP, like its predecessor, includes two 24 x 24-bit multiply-accumulate (MAC) units; two 24/56-bit ALUs and one 24-bit ALU; two 24-bit data memories and a 32-bit program memory; and two address-generation units.
The key difference is that the new core supports three modes of operation rather than just one: scalar (used in the CoolFlux DSP core), SIMD and complex. The mode is determined based on the class of instruction used and a status bit.
|Figure: The NXP CoolFlux BSP architecture can split its 24-bit data path into two 12-bit paths to increase computational throughput.|
In SIMD mode, each MAC unit (or ALU) is split such that it executes two 12-bit operations (the 56-bit ALUs can also perform dual 28-bit operations). In complex mode, the processor treats input data as a complex number with real and imaginary components, which can be 12 or 24 bits wide. The core can perform, for example, 12-bit complex multiplication (i.e., four real multiplications and two additions, as shown below) in two cycles, with single-cycle throughput:
(Ar x Bi) + (Ai x Br), (Ar x Br) - (Ai x Bi)
The core also explicitly supports complex addition and subtraction (Ai +/- Bi, Ar +/- Br), and can execute 24-bit complex calculations (with lower throughput). The BSP supports a range of specialized instructions for SIMD arithmetic, complex arithmetic, FFTs, Viterbi processing, and the CORDIC algorithm.
According to NXP, the new SIMD and complex math capabilities enable the core to calculate two taps per cycle for a 12-bit complex FIR filter, for example, or execute a 12-bit (with 28-bit intermediate results) radix-4 256-point complex FFT in 2480 cycles. (By way of comparison, the CoolFlux DSP core requires 8,930 cycles for a 24-bit (with 56-bit intermediate results), radix-2 FFT.) Overall, the SIMD and complex modes provide a significant speedup across a range of algorithms, but because they are 12-bit operations, the speedup comes at the cost of precision and dynamic range.
The CoolFlux BSP will compete with licensable cores from Tensilica, VeriSilicon, and Ceva, among others. Benchmark results for several of the competitor cores are available at BDTI core_scores (BDTI has not yet benchmarked the CoolFlux BSP.) According to NXP, the CoolFlux DSP core has been licensed by a number of (undisclosed) customers, both within NXP and outside NXP, and the BSP core will be used by a lead customer in a WiMax baseband application.
The BSP has some unusual features, particularly its complex, 12-bit computational capabilities. Perhaps a key question is whether 12 bits is enough for many baseband operations. Based on its own analysis, NXP believes that it is—particularly because the core can use larger intermediate results to maintain precision. And when it isn't, users can switch to the (slower) 24-bit mode. How well customers are able to make use of the core's maximum throughput (using 12-bit data) will have a big effect on the performance they're able to squeeze out of the core.
This article is excerpted from BDTI's full article on the CoolFlux BSP core. For the complete article, please visit InsideDSP.com.