SANTA CLARA, Calif. Bops Inc., a startup devoted to high-performance DSP cores, will roll out its next generation of software development tools at DesignCon this week. Version 2.0 of the Bops software development kit, which includes compilers for both C and MatLab, attacks the problem of developing highly tuned code for a new architecture.
Bops (Palo Alto, Calif.). is targeting high-performance applications, such as video signal processing, 3-D graphics, xDSL and wireless, said Rick Kepple, vice president of sales and marketing.
For instance, modulating signals what Kepple calls "antenna operations" for third-generation cellular basestations will require tens of billions of operations per second. In order to keep a densely packed system from overheating, power consumption must be held to about 10 milliwatts per 100 Mips. Total power consumption for a digital camcorder that performs MPEG encoding and decoding must be less than 600 mW, said Kepple.
Core story
The Bops cores are intended to offer scalable performance and low power consumption for those applications. Each core will execute 3.2 billion 16-bit operations per second with a 200-MHz clock. The architecture supports both fixed- and floating-point math with 8-, 16- or 32-bit operands. With 32-bit operands, performance is on the order of 1 billion floating-point operations per second.
The intellectual-property (IP) cores are meant to fit with I/O peripherals and memory and even other processors on special-purpose system-on-chip (SoC) designs. With approximately $400 million in sales in 1999, the SoC market is growing 50 percent a year, said Kepple.
For its part, the Bops core does not run control code and is meant to serve as a loosely coupled coprocessor to the ARM or MIPS processor cores. The core consists of two elements: a sequence processor (SP) for control and sequential functions, and a slave processing element (PE) for parallel tasks. Though it talks to a single MAC and ALU, the single-instruction, multiple data-like SP embodies three levels of parallelism: parallel data (via data memory), parallel instructions (via a VLIW instruction memory) and parallel processors (via an instruction address unit), said Kepple.
Different versions of the Bops core gang together SPs and PEs in parallel and serial combinations a matrix, in fact for various performance combinations. The Bops2010, for example, includes one SP and one PE in a 1 x 1 matrix. The Bops2020 includes one SP and two PEs in a 1 x 2 matrix; and the Bops2040 includes one SP and four PEs in a 4 x 4 matrix.
A cluster switch performs DMA-controller transfers in the background, and balances the activity in the matrix. Thus, at 100 MHz, a Bops2040 core will perform a 256-point fast Fourier transform in 2.2 microseconds (213 cycles), according to Kepple. The TI C6X will take 13.3 microseconds (2,660 cycles at 200 MHz) on the same operation, Kepple said. Compared with the C6X, the Bops2040 is one-fourth the die size, uses one-fourth the power and one-fifth as much memory and bus bandwidth, he said.
Programming issues
As with any parallel processor, programming remains an issue. The new software development tools to be introduced this week are meant to enhance what Kepple calls "the fourth P" (after performance, power consumption and process) of DSP criteria: programmability. In addition to a basic DSP library, the Bops tool set includes a system simulator, an instruction-set simulator, a GNU-C compiler, a GNU assembler and linker, a VLIW instruction packer and register allocator, and a compiler and vector library for MatLab.
There are only about 60,000 DSP programmers in the world capable of working in assembly language, compared with some 6 million capable of working in C, said Kepple. The goal of any programming tool is to utilize the base of C-language programmers. VLIW processors, like Texas Instruments' C6X, use C compilers. Their problem is that the assembly code they produce is never as efficient as that produced by hand coding, and the subsequent wasted machine cycles could never be tolerated on a battery-powered IP core or SoC. Bops claims its C compiler is one of the most efficient on the market. In addition to instruction-level compilation, its compiler will handle packed data and multiprocessor systems.
Compiler efficiency minimizes programmer tweaking. A certain amount of code tweaking is necessary to minimize program branch penalties, however, and to balance the load among parallel processors, Kepple said.
Bops previously introduced the Xemulator, which allows designers to try out a Bops-based design in FPGA-based hardware. A new marketing program, also to be announced at DesignCon, allows those potential Bops IP customers seriously interested in the architecture to complete an SoC design and simulation by receiving from Bops everything but the register-transfer-level code. Thus, SoC design work with the Bops IP can be completed concurrently with the license negotiations, Kepple said.