Design Article

DSP performance tuning, part 1: Cache, DMA, and frameworks

Analog Devices

1/31/2008 3:00 AM EST

Part 2 applies the concepts introduced here to the Blackfin processor. It will be published Monday, February 7.

This series reviews techniques you can use to tune DSP system performance, using the Blackfin processor as an example. Ideally, readers will have some basic understanding of the Blackfin architecture, a basic knowledge of software terminology, and some experience in embedded system development. The series focuses on video applications, but the principles we describe are applicable to all DSP applications.

The series starts with a look at software frameworks. After that, we will show how to best use the memory that's available in the Blackfin family, including both internal memory and external memory. As part of that discussion, we'll look at memory performance benchmarks. Next, we'll explain how to manage shared resources, particularly DMA and interrupts. Finally, we will conclude with a video decoder example that ties all these concepts together.

Some background
Application development typically starts with prototype C code written for a PC or a workstation. The code is later ported to the embedded processor and optimized. Figure 1 illustrates this process. As our earlier series "Programming and optimizing C code" explains, the compiler provides the first line of defense for optimization for C and C++ developers. The purpose of this series is to expand that level of optimization at the system level to include three basic things: memory management, management of the DMA, and management of the interrupts in your system. As Figure 1 shows, these optimizations are just as important as optimizations to the code.


(Click to enlarge)

Figure 1. Typical optimization process.

It is easy to explain why memory and DMA management are important for DSP systems. In most systems, you have a lot of data that needs to be moved around at high data rates. As a result, you end up using a combination of all memory resources in the processor, for example, internal and external memory. Figure 2 illustrates this point.


Figure 2. Video frame sizes and corresponding memory requirements on Blackfin. For frame sizes of CIF and below, it is possible to implement a decoder entirely in on-chip memory. For resolutions of VGA and above, off-chip memory is required.

Frameworks
Before we do anything else, we need to determine what type of "framework" to use. In this context, the framework is software infrastructure that moves code and data within an embedded system. Frameworks impact system performance since they define how memory and other system resources are utilized. Frameworks can also be created to reflect the specific performance, ease-of-use, and other requirements of your application.

Software frameworks fall into a few general categories:

  1. Processing on-the-fly
  2. Programming ease overrides performance
  3. Performance is paramount

The first framework category, processing on-the-fly, is ideal for safety-critical applications or for systems without external memory. In such cases, you either can't afford the time required to buffer the data, or don't have resources in your system to do so—e.g., you lack external memory and therefore have to do everything on-chip. In this scenario, you take the data in, operate on it, make a decision, and throw the data away. One needs to ensure, however, that the active frame buffer is not overwritten until processing of the current frame is complete.

Figure 3 shows a lane departure system—a safety-critical application. Here, you typically can't afford to wait 33 milliseconds for a full frame of data before making a decision. A better approach is to process one small subset of the frame. For example, you can detect the lanes from the bottom part of the frame, so just bring in the bottom of the frame.


Figure 3. Processing on-the-fly framework example. In the lane departure system shown, data is processed as it arrives without being buffered.

The second framework is typically applied in cases where programming ease is the most important parameter. It is ideal for designs in which time-to-market is critical, or those in which the need for quick prototyping and programming ease supersedes the need for performance. Such a framework also eases development for novices and experts alike.





Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form