United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 


Optimizing digital video codecs in ARM cores








EE Times


Expect multimedia applications to increase dramatically over the next few years, stimulated by the arrival of third-generation networks and explosive growth in the number of portable devices capable of displaying digital video. The ITU H.263 and ISO MPEG-4 standards are in place to handle the coding of visual objects. Supporting those standards on portable devices requires processing video at lightening speed, at the same time conserving power.

ARM’s Move technology consists of software and hardware components that enable the optimization of digital video codecs on ARM cores. A video codec contains many complex algorithms, but most of the processing time is spent in a few time-critical algorithms. Essentially, Move optimizes those critical algorithms and can be integrated into existing codecs to improve performance.

Underlying the technology is a set of application programming interfaces (APIs) for each component, allowing designers flexibility in deciding whether a particular algorithm should be supported by software or by an equivalent hardware component. That approach lets system designers and application developers select an appropriate suite of components to meet particular requirements.

Hardware components typically require less power than software and leave processing horsepower free for other applications, but they increase the silicon area and therefore raise the cost of the ASIC. The Move technology allows the designer to make appropriate trade-offs between cost and performance. For example, a video application could be constructed to be capable of using either the software alone or a mix of software and hardware.

The coprocessor is designed to optimize the motion estimation stage in a digital video encoder. The motion estimation must determine the motion vector for each block to locate a similar block in a previously coded image.

A search algorithm is used to find the best fit between blocks. The algorithm consists of a high-level decision-making algorithm and a computationally intensive block comparison algorithm.

The Move coprocessor can accelerate any search algorithm, via partitioning between the coprocessor and software. The higher-level decision logic is implemented in software for maximum flexibility; the dedicated coprocessor accelerates the number-crunching part of motion estimation.

That partitioning is efficient for several reasons:
  • The sum of absolute differences (SAD) metric is independent of the motion estimation algorithm.

  • The ARM processor supervises the block comparison in the Move coprocessor, so the result is immediately available to steer the motion estimation algorithm.

  • Because the Move coprocessor is tightly coupled to the CPU, it can use the CPU’s data cache, speeding the motion estimation data path considerably without requiring an additional cache.

There is a standard method to compare two 8 x 8 blocks using the SAD metric: The smaller the SAD result, the better the match. If SAD = 0, the two blocks are the same. Typically, the motion estimation algorithm will search for the best motion vector by repeatedly trying candidates, moving in the direction in which the SAD is decreasing.

On the current ARM architecture, calculating the block SAD requires approximately 3.5 cycles per pixel if implemented using software only. More than 50 percent of the encoding time is spent performing that operation. The Move coprocessor accelerates the operation so that it requires only 0.5 cycle per pixel, using a set of instructions together with a small register bank. This is achieved using only 12k gates and less than 1k of software API.

There are several standard algorithms for many components. The designer’s choice of algorithm depends on such CPU features as the size of the register bank; the relative speed of artithmetic functions, unique CPU features and the speed of load-store operations.

Each Move software component uses the best implementation of the most appropriate algorithm for execution on the ARM. For example, decision logic is efficient on ARM processors, so early termination has been implemented to accelerate common operations. On ARM architectures without DSP extensions, inverse and forward discrete cosine transform algorithms have been chosen that minimize the number of multiplication operations.

The current ARM architecture has no specific instructions to support single-instruction, multiple-data operation. But certain SIMD operations can be synthesized using a sequence of normal ARM instructions.

In this example of MPEG-4 motion compensation, the 32-bit registers x and y each contain four 8-bit pixels. The operation

destk = (xk + yk + 1) >> 1, k=0...3

can be implemented in four cycles on any ARM processor using this instruction sequence:
ADDS dest, x, y
EOR temp, x, y
ANDS temp, r0x80808080, temp, RRX
ADC dest, temp, dest, LSR #1

ARM plans to implement additional Move hardware blocks to address a wider range of performance, from wireless multimedia messaging at QCIF resolution (176 x 144 pixels) and 15 frames/second to digital cameras at Quarter VGA (320 x 240) and 30 fps.

ARM Architecture v6 extends the ARM instruction set with support for SIMD arithmetic, together with enhanced multiply and saturation instructions. The architecture will accelerate a broad range of applications, including digital audio and Move video. Full technical details of the ARM v6 architecture will be announced at the Microprocessor Forum in October.

See related chart











  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
With Acquisition Delayed, Sun Cutting 3,000 Jobs
With its proposed acquisition by Oracle being delayed by regulators, Sun plans to cut 3,000 jobs across several regions over the next 12 months.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About