Part 2 examines the suitability of the C6000 and PowerPC for multi-processor systems. It will be published Thursday, December 27.
This series focuses on mainstream, established technologies. It does not cover the latest technologies such as the C64x+ architecture and Serial RapidIO.
With Motorola's (now Freescale) introduction of AltiVec technology, and specifically the fourth generation MPC74xx PowerPCs, digital signal processing applications are beginning to migrate from a traditional DSP environment to a RISC environment. At the same time Freescale has been advancing the processing power of the PowerPC, Texas Instruments has been introducing new parts in its C6000 family, which offer more speed and flexibility to an already impressive portfolio of DSPs. The focus of this paper is to compare and contrast hardware, software, development tools, design philosophy and in general, the application development environment for these two families of processors. While in many applications similar performance can be achieved from either processor, how that performance is achieved is considerably different with each, and choosing which processor to use remains a non-trivial decision.
Current DSP and PowerPC Offerings:
This paper focuses on processors used to satisfy high-performance application requirements. As you might expect, these include compute intensive applications, but also applications that require high data throughput where the actual processing may be moderate, but the ability to process a data stream without losing a single sample is critical. Table 1 lists some of the processors available from Texas Instruments and Freescale that are used to satisfy these requirements.
Core Technology: VelociTI vs. AltiVec:
Two unique technologies, TI's VelociTI and Freescale's AltiVec provide the processing power that makes either of these processors capable of handling the demanding requirements of many real-time, high-throughput, and calculation intensive applications. The C6000 family comprises the highest performance processors currently available from Texas Instruments. All C6000 DSPs feature VelociTI, a highly parallel architecture that allows up to eight instructions to be executed on eight functional units with every clock cycle. Figure 1 shows a simplified diagram of the C6000 core. Each of the eight units uses a 32-bit instruction, so a 256-bit very long instruction word (VLIW) is fetched on every clock cycle.
With the help of an optimizing C compiler and assembler, a user's C language application code can be "parallelized" and pipelined, to make most efficiently use of the functional units. The eight units are actually two sets of four (L, S, M, D). While each of the units is specialized for a specific set of instruction, having two of each gives the compiler flexibility when it needs to simultaneously execute two instruction that need the same type of functional unit, allowing the two instructions to execute in a single cycle rather than two consecutive cycles. While all instructions can execute on at least two units, many can execute on four or six. The orthogonal nature of the VelociTI architecture and instruction set provide flexibility to the compiler to produce better (more optimized) code. Table 2 shows a list of some of the instructions that can execute on multiple functional units.
Freescale's fourth generation (G4) PowerPC architecture combines the existing RISC design found in the third generation family with the new AltiVec vector parallel processing engine. Figure 2 shows a simplified diagram of the main processor blocks.
The AltiVec unit operations are performed on multiple data elements by a single instruction. This is often referred to as SIMD (single instruction, multiple data) parallel processing. The unit operates on 128-bit data and supports the following data types:
- 16-way parallel operations for 8-bit signed and unsigned integers and characters
- 8-way parallel operations for 16-bit signed and unsigned integers
- 4-way parallel operations for 32-bit signed and unsigned integers and IEEE floating-point numbers
Each AltiVec instruction specifies up to three source operands and a single destination operand. Figure 3 shows a 16-way, 8-bit operation.
In addition to parallel arithmetic operations the AltiVec unit can perform numerous non-arithmetic operations to manipulate data. These include bit manipulation like shift and rotate, logical operations and data reordering. Figure 4 shows an inter-element permute operation that allows 16 8-bit elements to be rearranged into a single 128-bit word. This operation executes in a single cycle.
To support the AltiVec technology, 162 new instructions have been added to the PowerPC's instruction set. These new instructions fully utilize the single instruction, multiple data and vector manipulation design.