United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 


Embedded DSP code is optimization exercise
Print this article Email this article Reprints RSS Digital Edition

EE Times


Many of today's digital signal processing (DSP) applications are subject to real-time constraints. Burdened by these constraints, DSP applications are growing to a point where they are stressing the available CPU and memory resources. By understanding three main optimization strategies, including DSP architecture optimization, DSP compiler optimization and DSP algorithm optimization, developers can speed up applications, free up memory and achieve improved power dissipation--all dependent on the goals of the developer.

Determining what to optimize depends on the goals of the application developer. For example, optimizing for performance means that a developer can use a slow or less expensive DSP to do the same amount of work. In some embedded systems, cost savings like this can have a significant impact on the success of the product.

The tricky part to optimizing DSP applications is to understand the trade-offs between the various performance parameters such as throughput (execution speed), memory usage, I/O bandwidth and power dissipation.

For instance, optimizing an application for speed often means a corresponding decrease in power consumption, but an increase in memory usage. Optimizing for memory may also result in a decrease in power consumption due to fewer memory accesses, but an offsetting decrease in code performance. The various trade-offs and system goals must be understood and considered before attempting any form of application optimization.

The fundamental rule in computer design as well as programming real-time DSP-based systems is "make the common case fast and favor the frequent case." This is really just Amdahl's Law that says the performance improvement to be gained using some faster mode of execution is limited by how often a developer uses the faster mode of execution. In other words, developers shouldn't spend time trying to optimize a piece of code that will hardly ever run. They won't get much out of it, no matter how innovative they are. Instead, if developers can eliminate just one cycle from a loop that executes thousands of times, they will see a bigger impact on the bottom line.

DSP architectures are designed to make the common case fast. Many DSP applications are composed from a standard set of DSP building blocks such as filters, Fourier transforms and convolutions. These algorithms all share a common characteristic; they perform multiplies and adds over and over again. This is generally referred to as the sum of products (SOP). DSP chip designers have developed hardware architectures that allow the efficient execution of algorithms with SOPs. This is done using specialized instructions such as single-cycle multiply and accumulate (MAC) architectures that allow multiple memory accesses in a single cycle and special hardware that handles loop counting with very little overhead.

DSP algorithms can be made to run faster using techniques of algorithmic transformation. For example, a common algorithm used in DSP applications is the Fourier transform. The Fourier transform is a mathematical method of breaking a signal in the time domain into all of its individual frequency components. The process of examining a time signal broken down into its individual frequency components is also called spectral analysis or harmonic analysis.

There are different ways to characterize Fourier transforms:

  • The Fourier transform (FT) is a mathematical formula using integrals;
  • The Discrete Fourier transform (DFT) is a discrete numerical equivalent using sums instead of integrals, which maps well to a digital processor like a DSP; and
  • The Fast Fourier transform (FFT) is just a computationally fast way to calculate the DFT which reduces many of the redundant computations of the DFT.

    How these are implemented on a DSP has a significant impact on overall performance of the algorithm. The FFT, for example, is a fast version of the DFT. The FFT makes use of periodicities in the sines that are multiplied to perform the transform. This significantly reduces the amount of calculations required.

    Recognizing the significant impact that efficiently implemented algorithms have on overall system performance, DSP vendors and other providers have developed libraries of efficient DSP algorithms optimized for specific DSP architectures.

    Depending on the type of algorithm, these can be downloaded from Web sites (be careful of obtaining free software like this from unknown vendors--the code may be buggy as there is no guarantee of quality) or bought from DSP solution providers.

    Just a few years ago, it was an unwritten rule that writing programs in assembly would usually result in better performance than writing in higher-level languages such as C or C++. The early "optimizing" compilers solved the problem of too general or simplistic optimization. The results were not as impressive as results achieved from a good assembly-language programmer. However, compilers have continued to improve, and today there are very specific high-performance optimizations performed that compete well with even the best assembly-language programmers.

    A general optimization strategy is to write DSP application code that can be pipelined efficiently by the compiler. Software pipelining is an optimization strategy to schedule loops and functional units efficiently. In the case of Texas Instruments' TMS320C62x generation of DSPs, there are eight functional units that can be used at the same time.

    It's up to the compiler to figure out how to schedule instructions on all of these units for each clock cycle. Sometimes is a matter of a subtle change in the way the C code is structured that makes all the difference. In software pipelining, multiple iterations of a loop are scheduled to execute in parallel. The loop is reorganized so that each iteration in the pipelined code is made from instruction sequences selected from different iterations in the original loop.

    Embedded real-time applications are an exercise in optimization. By taking into account the three optimization strategies and the trade-offs discussed throughout the article, developers will enjoy noticeable improvements in the performance of their code in terms of cycle count, memory use and power consumption.

    Robert Oshana is engineering manager, digital signal processing at Texas Instruments Inc. (Houston).

    See related chart






  •   Free Subscription to EE Times
    First Name Last Name
    Company Name Title
    Email address
      Click here for your Free Subscription to EETimes Europe
     
    CAREER CENTER
    Looking for a new job?
    SEARCH JOBS
    SPONSOR

    RECENT JOB POSTINGS
    CAREER NEWS
    SRC Expands R&D Centers
    The Semiconductor Research Corp has added a new center to its university R&D efforts.

    For more great jobs, career related news, features and services, please visit EETimes' Career Center.


    All White Papers »   

     
    Education and
    Learning


    Learn Now:












    Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
    Network Websites
    International
    Network Features




    All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
    Privacy Statement | Terms of Service | About