Design Article
DSP programmer's guide
Kenton Williston
11/14/2007 3:00 AM EST
C coding and compilers
Optimizing Compilers and Embedded DSP Software
This article summarizes techniques that can improve compiler performance in terms of cycle count, memory use, and power consumption.
Get better DSP code from your compiler
This article presents even more tips that will help you coax efficient code from your compiler.
Programming and optimizing C code
Part 1 introduces the basic principles of writing C code for a DSP processor. It also explains how to profile and optimize code.
Part 2 shows how to optimize DSP "kernels," i.e., inner loops. It also shows how to write fast floating-point and fractional code.
Part 3 explains how to access DSP features like circular addressing from portable C. It also shows how to use pragmas and inline assembly.
Part 4 explains why it is important to optimize "control code," and shows how to do so.
Part 5 shows how to optimize memory performance, and how to make speed vs. size tradeoffs.
Simulators and profilers
Measuring DSP code performance
Before you can optimize code, you must measure its performance—and this can be surprisingly challenging. Here's how to successfully measure performance using both simulators and hardware.
DSP optimization strategies using simulators and profilers
This article reveals the pros and cons of simulators and profilers. It shows how to use these tools to optimize code and to choose the right memory layout and sizing.
Optimization strategies
Tutorial: Programming High-Performance DSPs
Part 1 explains the features of high-performance DSPs, with a focus on VLIW pipelines and multi-level memory architectures. It shows how to write code for these advanced architectures. It also introduces Direct Memory Access (DMA), and explains how to use it.
Part 2 explains how to optimize code for high-performance DSPs, with a focus on loop unrolling and software pipelining. It shows how to minimize loop overhead, and how to keep a DSP's execution units busy.
Part 3 shows how you can help the compiler produce faster code. It explains the drawbacks of software pipelining. It also explains how to optimize for minimum power consumption.
Optimizing for cache performance
Part 1 explains how caches work, using the two-level cache in TI's C64x as an example. It also outlines the causes for cache misses.
Part 2 explains how to minimize cache misses by increasing data reuse, re-organizing memory layouts, and grouping functions. It includes practical examples for each technique.
Optimizing for instruction caches (NEW!)
Part 1 explains how locality impacts instruction caches, and shows how to increase performance through code partitioning, function inlining, and other techniques.
Part 2 looks at the tradeoffs between program and data cache optimizations, and shows how to choose the best compromise.
Part 3 shows how to optimize code by modifying the placement of functions in memory.
Debugging
Testing and Debugging DSP Systems (NEW!)
Part 1 introduces the hardware used for debugging, the debugging challenges facing DSP programmers, and debugging methodologies.
Part 2 explains the workings of the JTAG (IEEE 1149.1) boundary-scan technology. It defines the test pins and the test process associated with a JTAG port.
Part 3 explains how emulators control programs on the DSP through functions such as breakpoints and single-stepping.
Part 4 explains how to use breakpoints, event triggers, and program traces to debug code.
Part 5 introduces the concepts of real-time data collection and data visualization. It also explains how compiler options affect the debugging process.
Part 6 reviews the common bugs found in DSP applications, and outlines the different testing methods required to catch these bugs.



