Editor's Note: This is the first of two articles discussing how Massively Parallel Processing Arrays (MPPAs) can be used to accelerate high-performance embedded system applications. In this article, we discuss the requirements of high-performance applications and how MPPAs compare with other architectures. The second article, scheduled to appear on July 18 will explore the amount of effort involved in programming an MPPA architecture to implement a JPEG image compression application.
High-performance embedded video and imaging applications are characterized by complex algorithms that need to process data at a high rate of throughput. For example, the video market segment contains several high-end applications, such as video codecs, medical imaging algorithms, and intelligent imaging, which involve tens to hundreds of operations-per-pixel and target large image resolutions. Similarly, many wireless applications apply complex transforms to incoming streams of high-throughput data. Many of these high-end applications rely on state-of-the-art standards and continuously changing proprietary algorithms.
As a result, targeting high-performance DSP applications requires that developers create implementations that:
- are fast enough to meet demanding processing requirements
- are developed quickly enough to reach the market on time
- can easily be upgraded to provide a different functionality
There are three types of architectures commonly used today to meet these requirements: Application-specific Integrated Circuits (ASICs), Field-programmable Gate Arrays (FPGAs), and high-end Digital Signal Processors (DSPs).
ASICs provide excellent performance but lack any kind of programmability. This type of architecture is therefore commonly found in very high-volume applications where any savings on cost or power consumption can justify the lack of programmability. Lower volume applications do not justify the high cost, risk, and complexity of the initial development associated with ASICs.
FPGAs are a step down from ASICs in terms of performance and a step up in term of ease of use. Since they rely on reprogrammable logic, FPGAs are inherently less silicon-efficient than ASICs. What is lost in efficiency is gained in flexibility, since FPGAs, unlike ASICs, are programmable. The common programming language for FPGAs is Register Transfer Language (RTL), a language applying very basic operations such as Boolean equations and control logic on registers such as Verilog or VHDL. FPGAs are versatile solutions that are well suited for interfacing devices (thanks to their extensive and programmable IOs), prototyping early silicon, or running low-volume, high-performance applications.
High-end DSPs are at the opposite end of the spectrum from ASICs. DSPs are processors programmed in a mix of high-level language and assembly language, a property that is appealing to software developers. These processors have been tuned to handle DSP applications efficiently.
For example, most DSPs include hardware multipliers for efficient multiply and multiply-accumulate operations, and support two memory accesses per cycle, to run more efficiently algorithms such as filters that operate on two arrays at once. High-end DSPs have been further optimized to meet more demanding processing requirements. These processors include:
- multiple specialized instructions that implement specialized operations commonly found in DSP applications (e.g., Sum-of-Absolute-Differences or SAD)
- an increased number of pipeline stages to reach higher clock speeds
- an increased number of processing units and more complex sets of parallel instructions to increase the number of operations that can run in parallel during each processor cycle.
Next: Limitations of ASIC, DSP, FPGA and multicore