On top of the well-known signal-processing demands of broadband wireless devices, data-intensive 2.5G and 3G applications will require even more processing power. To achieve multimedia features such as messaging, video conferencing, 3-D interactive gaming and location-based services, designers must not only cope with greater baseband signal processing needs, but they must also prepare wireless systems for multimedia processing.
In the newly emerging market of 2.5 and 3G wireless communications, high-bandwidth wireless devices will be based on an architecture featuring a digital signal processor for real-time processing and a general-purpose processor to support control functions. This offers system designers the flexibility of executing commands on the type of processor best suited to the task at hand. The operating system and file-management functions are most effectively handled by a general-purpose processor, while computationally difficult communications and multimedia tasks are more efficiently processed by a DSP.
This design approach is a popular option in wireless handset and PDA solutions. Partitioning multimedia tasks to the DSP while retaining command and control functions on the general-purpose processor minimizes power consumption and optimizes application processing. By doing so, more processing power is allocated for the feature-rich and intensely graphical applications of 3G. Alternatively, many single-core architectures use a major portion of the processor's resources to support just the operating system.
While next-generation applications will enjoy high performance on an architecture featuring a DSP and general-purpose processor, performance can be further refined using hardware extensions specifically designed for multimedia functions.
Multimedia on wireless handheld devices is all about efficient media codecs. Large amounts of data must be moved over a relatively narrow wireless pipe that requires data compression. But there is neither enough space nor power budget for dedicated codec hardware.
An analysis of the most prevalent imaging coding/decoding algorithms, such as MPEG-x, H.26x, Motion JPEG and others, reveals that a relatively few functions form a major portion of most of today's multimedia codecs. In fact, four functions account for approximately 80 percent of the processing in a typical MPEG-4 codec. These common functions are discrete cosine transform, inverse discrete cosine transform, pixel interpolation and motion estimation. If these functional blocks could be accelerated efficiently in hardware, then the goal of 3G running new multimedia applications on mobile devices can be achieved.
For example, performing MPEG-4 encoding in the quarter common intermediate format at 15 frames per second can overwhelm the resources of a RISC-based processor; however, a DSP has the computational capabilities to easily handle such a task. An MPEG-4 codec requires about 153 million processor cycles per second (Mpcs) on a general-purpose RISC-based processor, but only 21 Mpcs on a DSP with hardware acceleration.
Multimedia acceleration allows the wireless-device manufacturer to implement more high-bandwidth applications, such as streaming video and interactive gaming, on a single device. However, some general-purpose processors can offer the performance required for wireless devices that display feature-rich multimedia such as short video clips, multimedia messaging and e-mail.
Having identified the multimedia functions that will have the biggest payback in terms of accelerating applications, the next problem to address is what acceleration technique is the most effective for 2.5G and 3G devices.
Some hardware accelerators are not completely integrated with the processing core. That is, the processing core has its own set of registers and other resources distinct from the registers and resources dedicated to the accelerator. With this type of implementation, the accelerator must rely on the processing core to move data from the core's registers to the accelerator's registers for processing. When the task has been completed, the core must again move the data from the accelerator's registers to its own. In a computationally intense codec, this high level of data movement can slow down the overall processing of the algorithm and consume significant battery power.
A more effective alternative would fully integrate the accelerator with the processing core. As implemented in Texas Instruments' C55x core, the processor core and the accelerator unit share registers and other core resources, minimizing internal data traffic. In addition, the involvement of the processing core is limited because it is not being constantly called upon to move data in and out of registers.
Another important characteristic of this integral approach to an accelerator unit is the openness of the interface between the processor core and the accelerator. If this interface conforms to an open standard, such as the ISA Extensions Interface, other types of functions or instructions can be quickly incorporated into the accelerator in the future should the need for different functionality arise.
With an integral implementation of key multimedia functions in DSP hardware, a mobile device can easily perform video encoding/decoding at 30 frames per second at CIF resolution. This would be considered high-end performance for most handheld platforms. And the power consumed by such an application would be less than 50 mW, ensuring that the mobile device could operate for hours on a pair of AA batteries.
See related chart