Design Article

IMG1

Video transcoding techniques and applications

Asheesh Bhardwaj
Software Chip Architecture and Specifications,
Texas Instruments

1/30/2009 12:30 AM EST

Audio-video transcoding has historically been considered a straightforward, traditional affair in which the encoded source video was decoded to produce a not-quite-perfect version of the original uncompressed content and then re-encoded into the format required for transport or viewing.

As the number of video compression algorithms has grown and more low-cost consumer systems have begun using digital video, the brute force approach has come under scrutiny by engineers who have been tasked with designing low cost systems with good video performance. Although the brute force approach produces a high quality video result, quality decreases with each encode/decode cycle because artifacts are included in the coding as if they were valid data. Other drawbacks include:

  • Processor demands: As the algorithms have become more efficient, they have also become more complex and require greater processing capability, especially if the conversion is expected to be executed in real time. Even if a chip used for transcoding can handle the brute force approach, a less computationally-intensive method would allow the same chip to handle more channels and reduce overall system cost.

  • Memory resources: The decode/encode operation typically requires the decoded data to be stored in memory and extra memory increases the system bill of materials. Particularly in price-sensitive consumer devices, this can mean the difference between market success and failure.

Video playback on mobile devices provides a good example of an application in which it is desirable to reduce processor loading and save memory resources. Although transcoding does not take place in the cell phone itself but in the Video on Demand (VoD) server or video gateway, Mobile TV/mobile video telephony is still restricted by five considerations beyond the VoD server and video gateway itself, including:

  • Network bandwidth
  • Processing power in the mobile phone
  • Display resolution
  • Memory size
  • And -- a parameter not normally thought of in video playback -- the mobile phone's energy consumption.

Even though mobile video devices are becoming more powerful with each generation, processor speed of a typical device is between 300 MHz and 600 MHz and memory capacity is about 64 Mbytes. Even a cursory evaluation indicates that not all of the information encoded in HD or SD video for a large screen has to be processed for viewing on a mobile device. On the other hand, the HD or SD video captured using handheld devices needs to be processed for viewing on the end equipment by the network servers.


View full size

Table 1: Impact of encoding parameters.

Table 1 provides a rough guide to the impact of different video coding parameters on the resources that the decoding device has to satisfy. The table refers specifically to block-based video codecs that use motion compensation and discrete cosine transfer (DCT) for video compression.

The results of modifying these key parameters can be significant. Decreasing detail resolution (which is accomplished in the encoding process by increasing the quantization factor) can reduce the energy consumption of video decoding by 75% to 85%. But this results in a quality decrease of just 5% to 13%.

Next: Transcoding options
Transcoding options
Over the past few years, several techniques have been proposed to provide a transcoding process that makes more efficient use of processing and memory resources. Most of these approaches are at least partially based on the recognition that quantization and frequency domain information created during the initial encoding can be modified, discarded or utilized in more sophisticated ways than simply to reconstruct the original video content.

Another way of describing these techniques is to say that they either discard some information or they transfer frequency domain information between the source and target without going through the step of decoding it into the pixel domain.

To elaborate, the most important advantage of the traditional approach is that the video it delivers has a high level of fidelity with the original video content. Therefore, the engineering objective when using other transcoding techniques is to provide as high a level of video quality as possible while reducing the requirements of system processing and memory resources. This is best accomplished by matching transcoding techniques with specific applications.


View full size

Figure 1: Typical flow of loosely coupled transcoders mostly used by the applications

Generally speaking, three general approaches or architectures are available for transcoding. Here are short definitions of each along with their associated tradeoffs.

  • The traditional method is described as decoupled transcoding. In addition to providing the best quality video, it also has the greatest flexibility in terms of source-to-target formats, resolutions and bit rates.

  • Loosely coupled transcoders: The transcoding process utilizes most of the motion vector and other side information from the decoded incoming video for encoding. The re-encoding process may refine the motion vector or do a more efficient motion vector calculation depending on the encoding requirements. This method reduces computational complexity from the decoupled transcoders and matches the quality of the traditional method.

  • Tightly-coupled transcoding: Re-encoding is accomplished using motion vectors and without going into pixel domain for re-calculating the motion vector information. The transcoding process can take place in the transform domain as well. One important consideration is that because this method does not perform the motion re-estimation at all, it does not allow for changes in resolution. Memory and processor requirements are minimized, but at the cost of picture quality. The algorithms for tightly-coupled transcoding are difficult to create but can be developed based on the specific requirements.

  • Transrators: A partial decoding of the bit stream is performed in the transform domain and the bit stream is re-encoded with the desired bit rate that can be supported by the network. The video format does not change during transrating. The inverse transform is not performed, and re-quantization is done in frequency domain. This technique is generally employed to remedy a specific problem. One example would be a cable head-end where channel capacity has degraded at the end of the cable plant, but there is still an obvious need to transmit almost the same quality and resolution of video to the end devices.

Next: Matching transcoders to applications
Matching transcoders to applications
Decoupled and loosely coupled transcoding is typically used in applications where picture quality is important, such as set-top boxes, video conferencing, IPTV and some VoD applications. Deciding what transcoding method to use is a matter of evaluating performance expectations and memory bandwidth for a specific application.

Tightly coupled transcoding is the best choice for systems that are memory constrained or do not require high quality images. End-to-end video telephony on mobile phones is a good application example because the video will ultimately be displayed on a small screen at a relatively low resolution. There is no need to transport a high-fidelity version of the video through the entire network. In addition, a good deal of processing power, memory and bill-of-material cost can be saved in aggregate, particularly in very high volume devices such as mobile phones.

As previously mentioned, transrators are typically used for special instances in cable, IP TV and video telephony to match the transmission data rate to available system bandwidth. Both audio and video are transrated and the difference in quality is virtually imperceptible.

Platform considerations
Design engineers have known for a long time that creating a new design from every product variant is less efficient than creating a platform at the beginning of the design cycle and making the platform as flexible as possible. Set-top boxes (STBs) are a good example of a product with many variants. Transcoding plays a major role in estimating processing power and other platform parameters.

Because high-quality video is a primary consideration for STBs, it would be unlikely that anything other than loosely coupled transcoding would be considered as the primary method.

The market success of a STB design depends as much upon the platform chosen at the beginning of the design process as it does on the subsequent step-by-step implementation of the design. STBs are marketed in different price ranges, sold in every part of the world and must accommodate a wider range of video formats than products such as video conferencing systems where the formats are standardized.

Some of the higher levels of decisions for STB designers are:

  • How many channels will the STB be required to handle simultaneously (main video and picture-in-picture) and how many TVs in different rooms of the home?
  • Which product-differentiating features will be included to provide a market advantage (e.g. picture-in-picture, proprietary graphics, Blu-ray disc recording, viewing video for communication, connecting to IP network, cable network)?
  • What a priori price point has the marketing department provided?

As is typical in most designs, these three criteria interact so it also becomes important to create a platform upon which all of the product variants can be based. This, in turn, implies a flexible processor complemented by development software compatibility across designs as well as extensive firmware libraries, algorithms and support.

Next: Setting the performance bar
Setting the performance bar
At the high end of the STB platform's performance capability, the bar must be set for HDTV 1080p. STBs must be capable of supporting this high level of throughput and rescaling display outputs in real time.

In addition, a wide range of content source and digital display formats must be supported. At the lower end reside CIF and its subdivisions such as QCIF, which is used in streaming video and provides the basis for divided screen applications on DTVs.

Depending on whether the STB will be integrated into the home computer network, there are also computer displays, including a group of HD formats often used in entertainment systems. Format conversions between source content resolutions and the target display resolution need to cover a wide range, including scaling down HD video for low-resolution displays and scaling up low resolution content for HD displays.

HD is well known for its voracious appetite for bandwidth and that means support for compression algorithms, including advanced codecs such as H.264/MPEG-4 part 10/AVC and WMV9/VC-1. The conventional MPEG-2 transport streams typically need to be transcoded to the advanced codecs.

Migration to the more advanced codecs is taking place gradually, so transcoding for backward compatibility with MPEG-2 is also a key requirement. And while video consumes the lion's share of processing power, the audio stream also has to be encoded and decoded into and from various formats.

It is also good practice to "future proof" designs to the degree possible so new codecs can be accommodated. Dolby digital and AAC stereo are the typical audio requirements for broadcast market for audio transcoding from other traditional audio formats. From a hardware perspective, this implies programmability, of course, but also multiple processors, or, processor cores if the transcoding is implemented on an SoC.

Finally, in home networks, it is necessary to transcode not only in order to shift content bit rates and rescale formats, but also to convert ownership protection methods between the TV industry (various forms of conditional access) and the PC world (digital rights management, or DRM).

Next: Transcoding in STBs


Transcoding in STBs
Transcoding hardware has to interact with other systems within the overall STB system. This includes digital tuner, demuxer and demodulator, DDR2 memory, a PCI bus, and a high-bandwidth interface to transfer audio/video data to the STB SoC.

Figure 2 illustrates a common, but not necessarily generic, architecture.


View full size

Figure 2: Typical STB system architecture.

Beginning with the audio and video decoding, encoding and transcoding, a video System-on-Chip (SoC) for a STB capable of handling two channel encoding and decoding or single channel transcoding would require a minimum of four co-processor cores: a GPP core handling the SoC's control operations, a DSP/GPP core, which would handle audio transcoding operations; and, the DSP core supporting two coprocessor cores for video processing (one for each channel).

While the DSP and video/imaging coprocessors work cooperatively, the primary function of the coprocessors is to execute encoding and decoding algorithms such as H.264, MPEG-2 and MPEG-4.

The incoming data stream is multiplexed audio and video that is de-multiplexed with the GPP handling audio decoding.

Next: Switched Central Resource
Switched Central Resource
To the processing and system control functions are added peripherals grouped generally as connectivity, serial interfaces and program/data storage, these would be connected to the processor modules by a switch fabric plus bridges which are together called a switched central resource (SCR), an interconnect system that provides low-latency connectivity between master peripherals and slave peripherals. A SCR is the decoding, routing and arbitration logic that enables the connection between multiple masters and slaves that are connected to it.

Texas Instruments' digital media processors based on DaVinci technology implement this architecture. A simplified version of the architecture is shown in Figure 3.


View full size

Figure 3: TMS320DM6467 digital media processor based on DaVinci technology block diagram.

This STB discussion has assumed that loosely coupled transcoding would be used in STBs. While that is by far the most likely scenario, it should be noted that hardware capable of loosely coupled transcoding may also be capable of handling the other three methods. In order to achieve multiple room and multi-channel requirements, a couple of DM6467 digital media processors can be connected through DDR2 and PCI for exchanging data between the devices and scale the architecture. With clever engineering, a processor such as the DM6467 based on DaVinci technology, for example, could be pressed into service with another decoding method if an application appeared that involved transferring personal SD or HD video, from a video camera, let's say, on the STB's hard drive to a cell phone.

About the author
Asheesh Bhardwaj is a member of Texas Instruments' technical staff working on software chip architecture and specifications, as well as applications for next-generation digital media processors based on DaVinci technology. He joined TI's Catalog Applications group in December 2007. He can be reached via support@ti.com.


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm