Design Article

Parallel processing formulti-core DSPs

Bruce Schulman and Zafer Zamboglu

1/29/2007 9:00 AM EST

Modern video-processing systems, like multichannel digital video recorders/analyzers used in security systems, run multiple applications such as image processing, compression and content analysis concurrently on many processors. These systems are driving demand for more digital signal-processing horsepower, well beyond the ability of most vendors to scale up their DSP chip's performance. System designers are forced to use multiple DSP chips, FPGAs and a system controller. But that creates difficulties for applications programmers and system software engineers, since chip-level software tools don't address the system integration issues.

To address this new level of performance and to improve integration, vendors are introducing parallel-processor chips. But successfully porting applications from a low-performance uniprocessor to a high-performance parallel processor takes planning and some clever tools to get the desired parallel speedup.

Many types of parallel processing can be leveraged in video applications to achieve higher performance at lower clock frequencies and with lower power. The main is scheduling the instructions and data to keep all the parallel hardware busy.

Breakthrough performance and scalability in multichannel video encoding and video content analysis can be achieved using a dynamic scheduler and a heterogeneous multicore processor. With a dynamic scheduler, software developers can expect to write applications that can scale in performance by running on an arbitrary number of DSP processors, while using and sharing all processors efficiently.

Applications involving video compression and object detection/tracking present a particularly difficult set of requirements that often cause system designers to add too many parallel resources in order to meet the required performance. In each algorithm, there will be both control code and video signal processing. The latter will involve fixed-length and data-dependent algorithms, as well as multiple layers of data parallelism such as the processing of pixels, macroblocks, frames and multiple asynchronous channels. Each layer presents a different opportunity for parallel speedup.

An ideal solution would have a balance of control and DSP processors. The software would have the ability to dynamically adapt to the time-varying conditions of resource consumption due to data-dependent algorithms and to data availability due to asynchronous inputs.

A heterogeneous multicore processor can strike the required balance of parallel control and DSP processing, leveraging the multiple layers of data parallelism and allowing each type of processor to operate efficiently on its assigned tasks. Multiple, independent single-instruction, multiple-data (SIMD) DSPs can strike a balance between asynchronous and synchronous data tasks in these multichannel applications.

A multichannel video encoder security system can have multiple, independent asynchronous control tasks in each of several applications. The overhead from real-time operating system task swapping is minimized and the control code is simplified by allocating multiple parallel low-cost RISC processors, with each taking on a discrete control application.

For pixel processing, a fine-grained parallel DSP hardware approach such as SIMD or vector processing works well. Since only one instruction is issued for multiple data, this approach assumes that all the data is available at the same time, every time the instruction is run. For a lot of straightforward pixel processing, such as color-space conversion, the assumption is true and there is a full parallel speedup.


Next:




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form