Design Article
Optimizing DSP Performance and Minimizing Risk
Jack Shandle
6/17/2004 12:00 AM EDT
Design engineers who must balance heavy-duty signal-processing performance with low power consumption and/or aggressive price points for their chips increasingly find themselves in a quandary. Their choices include but are not limited to:
- Off-the-shelf DSPs that leave a lot of potential power savings on the table because the same programmability that makes them flexible and easy to use also consumes more cycles to execute an algorithm than a hardwired solution.
- Configurable processors that typically require a significant investment in time and money but deliver much closer to an optimized power/performance solution. These solutions fall into the ASIC category that requires a significant investment and high-volume sales of the chip to justify the investment.
- Programmable logic solutions that can deliver signal processing performance and efficiency by hardwiring algorithms but generally demand a price be paid in power consumption because of routing and gate utilization issues.
If an application profile reveals that 20% of the time spent executing the application is used doing control functions and 80% is used for signal processing, the case for configurability (or, mapping algorithms to gates) can be compelling. This is also one of those rare instances where designers have the best of both worlds because the custom solution uses fewer gates and, therefore, less power for more performance.
One way to view the designer's quandary graphically is to view processing as occurring in either a control plane, which is typically the responsibility of microprocessors and microcontrollers, or, a data plane, typically the province of DSPs.
|
The diagram in Figure 1 shows ARM Ltd.'s unique perspective on the problem. Its cores occupy the control/programmability quadrant (lower left). Adding single-processing extensions to the cores gives them the capability of handling low-bandwidth requirements for moderately complex algorithms such as MP3 for audio processing. Adding single-instruction multiple data (SIMD) instructions into the architecture handles applications such as video processing. ARM has also introduced application-specific accelerators for applications such as 3D graphics rendering. ARM's strategy up to now has not, however, been able to address data-plane applications in the upper right quadrant, which is why it has introduced OptimoDE technology.
Software development tools, for example, are an issue because they have to be:
- Robust
- Flexible enough to change with architectural changes and user requirements.
This is a tall order for the startup companies that are typically associated with technology breakthroughs. Another software-related issue is compatibility with legacy application code. Legacy C code can be recompiled, of course, but the new architecture's compiler has to be very efficient if it hopes to maintain the performance gains that justify the migration to a new architecture in the first place.
Still another problem is the viability of the new architecture in the marketplace. In other words, design teams must consider the vendor's long-term survival prospects.
This was the course being pursued by Adelante Technologies' AR|T DSP coprocessor division when ARM acquired it in July 2003. Since then, ARM has extended the technology, renamed it OptimoDE, and released it as a product in May 2004.
OptimoDE is a framework that includes a configurable VLIW-styled architecture, from which specific data engines are derived, a comprehensive set of hardware configuration tools, a powerful C compiler, reference examples and an AMBA interface kit.
The data engine consists of datapath units that are pieced together in drag-and-drop fashion to execute the target algorithm. These functional datapath units include generic arithmetic and logical units, storage, and interconnect. Other specialized functions include butterfly and DCT engines. Users can extend and customize datapath units to meet application-specific processing needs.
OptimoDE tools also include a C-compiler optimized for the architecture. Providing the compiler ensures that hardware and software are optimized together.
Using the template and the ARM configuration tools, the design teams configure the architecture to meet the specified performance requirements, adding other blocks as needed. OptimoDE software is capable of extracting parallelism from the algorithm and this helps the design team significantly. Low-power optimization techniques are also built into the tools. The process is iterative. The various parameters of the architecture are tweaked and the result profiled against the performance goals, which typically consist of a cycle budget and a power budget.
The software is automatically generated and can be re-generated to handle design changes and even alternative algorithms without changing the hardware. OptimoDE also automatically generates simulation models for Verilog simulation after the data engine is incorporated within a system-on-chip (SoC) design.
The results in terms of performance improvements can be impressive. An H.263 video codec that must handle full duplex, conversion of 30 frames per second within a 384 Kbps bandwidth, for example, uses no fewer than 10 algorithms for encoding, decoding and color conversion. The results of using OptimoDE as compared to executing the algorithms in software on an ARM 9E are shown in Table 1. The speed up in the execution of the various algorithms varied between 50X and 6X. But more important, the OptimoDE implementation ran at less than 100 MHz while the ARM9E option would have theoretically required more than an 800 MHz clock to execute the same algorithm suite.
| Estimated Performance Requirements | ARM9E | OptimoDE | |
| H.263 CIF 352x288, 30fps, 384kbps | MHz | MHz | Speed-Up |
| Decode | |||
| Deblock | |||
| Dering | |||
| IDCT | |||
| Motion Compensation | |||
| Encode | |||
| FDCT | |||
| SAD | |||
| IDCT | |||
| Motion Compensation | |||
| Color Space Conversion | |||
| YUV to RGB | |||
| RGB to YUV | |||
Table 1: Data engine performance compared to programmable implementation
OptimoDE also addresses most of the concerns noted above regarding the slow adoption of configurable DSP solutions. Development software tools, for example, are embedded in the application and are upgraded by ARM, which has a good track record for offering efficient tools. Since OptimoDE generates the accompanying application software, legacy software is not a problem and ARM's viability is not under question in most quarters.
Hearing instruments have all of the identifying attributes of "mobile" devicessuch as size, weight, and battery lifebut they take them to new levels of minimalism. This level of performance means that off-the-shelf, programmable DSPs are just not a good fit.
Power consumption, in particular must be ultra-low. For example, Phonnak's internal technology evaluation at the beginning of the project concluded that commercial DSP chips are up to one order of magnitude too power hungry for hearing instrument applications, says Hans-Ueli Roeck, manager of hearing instrument software development at Phonak.
Phonak had been using the technology developed by the AR|T DSP coprocessor division of Adelante Technologies for several years before the technology was acquired by ARM. Even in the early days, the technology had proven itself at Phonak. "It was crucial for the success of the products that integrated them," says Roeck. Following the introduction of data-engine-enabled hearing instruments, Phonak rose in rank to number three in the world.
Phonak's most recent experience has been with OptimoDE. The resulting chip is in working silicon. "The chosen DSP core by itself could not have fulfilled our targets of an ultra-low-power design." says Roeck.
Phonak's DSP system architecture analysis and its previous experience with data-engine enabled technology had taught the company an important lesson: Only sufficiently large functional blocks such as entire FFT's should be outsourced from the main DSP core into a specific data-engine to achieve the performance and power consumption goals. The resulting loosely coupled system adds to the system's ability to meet the power and performance goals. An optimized narrow interfacein this case a double-buffered RAMwas also needed to reach the ultra-low power consumption goals.
OptimoDE was used in this way to attach a specific FFT engine (128pt. FFT/inverse FFT in ca. 220 cycles) to the principal DSP core.
OptimoDE was also called on to implement a proprietary sample-rate processing engine. A specific time-domain-core (TDC) was added to the principal DSP core. "Both HW accelerators (TDC and FFT engine) worked out fully as expected," says Roeck.
Phonak specified the engines on a functional level in terms of power consumption, gate area, functionality and quality. But ARM's engineers in Leuven, Belgium provided the engineering services to actually design and implement them.
Phonak's previous experience with the technology helped quite a bit in speeding the design process. Numerous low-power optimization techniques, many of them already integrated into OptimoDE, were applied to the problem. Several rounds of power simulations over the entire DSP system were also required. A close and long standing working relationship between Phonak, ARM's service engineering team and the design house were also crucial to reach the partially conflicting targets of functionality, gate area and power consumption together. The key result in terms of low-power performance was an impressive 0.013mW/MOPS, according to ARM.
At the conclusion of this process, the power consumption, gate area, functionality and qualitythat is, being right the first timegoals were within specifications and our expectations," says Roeck. "ARM's Leuven engineering service team performed a superb job."
But they have one thing in common. Just as Phonak's market ranking climbed after utilizing OptimoDE technology, so did National's in a product line where data engine technology was implemented.
A DECT phone chip designed in part with Adelante's technology prior to its acquisition by ARM propelled National to second place in a market where it had not been in the top 10 before. Beyond its single product success, National also considers data-engine technology as a contributor to its overall product strategy.
Ahmad Bahai, chief technology officer of National's Wireless and Information Appliance Group, does not give all the credit to data-engine technology for lifting the SC14408 family of CMOS chips to market prominence. The radio design and other optimizations were also new. But it did play an important role in the design of a very low power processor that executes several algorithms required for echo cancellation, tone detection and equalization, he says.
Competing products typically use an off-the-shelf DSP processor for these tasks so using a data engine optimized for a few related algorithmsbut still programmableprovided a big advantage both in board space and power savings. National's project preceded ARM's acquisition of Adelante and itlike Phonakdoes not use an ARM microprocessor core in the design.
Although there is just one data-engine core in the DECT chip, Bahai sees a trend toward developing multiple data engines for National products. The primary reason for this strategy is the advantages of the data-engine core itself. But the strategy also leverages National's analog expertise. For optimal performance, each data-engine core needs its own optimal voltage and power management is one on National's core competencies.
A key advantage of OptimoDE technology is the speed with which cores can be reconfigured using the ARM tools. In fact, Bahai refers to the data-engine cores as object-oriented cores because after a core is developed it can be reprogrammed using the OptimoDE compiler to handle a similar algorithm. In National's design flow, the new algorithm is programmed in MATLAB, debugged at in an object oriented language and then passed on to the compiler, which creates a binary file for the design.
Contributing writer Jack Shandle is a former chief editor of both Electronic Design magazine and ChipCenter.com. He holds a BSEE degree and has written hundreds of articles on all aspects of the electronics OEM industry. Jack is president of eContentWorks, a consultancy that creates high-value content for publishers, eOEM corporations, and industry associations. His email address is jshandle@earthlink.net.


