Design Article

IMG1

Integrating an H.264 video encoder with Stretch's processor

Joe Hanson,
Stretch Inc.

11/10/2006 3:15 AM EST

The H.264 video compression standard defines the bitstream resulting from compressing video using the tools within the standard. The standard does not describe how the tools are implemented nor does it mandate which tools must be used during the encoding process. An application developer integrating H.264 encoding into an embedded application can benefit from a flexible software development kit which encapsulates the encoding detail but provides the flexibility to customize the implementation of H.264 to meet the needs of their own application.

Target applications and trade-offs
While the goal of H.264 encoding is to deliver similar quality video at 50% of the bit rate of an MPEG 2 encoded stream, defining a single H.264 encoder application for all uses is difficult. The target bit rates and image quality requirements vary by application.

For low-motion video applications, such as surveillance, the requirements include variable image sizes ranging from CIF to D1 resolutions, medium to good video quality, low latency between the input and the output, and transport is over IP to a remote host for viewing and storage. The user application varies from simply controlling the camera settings and position to performing pre-processing or other specialized analytics on the video images.

For high-motion video applications, such as broadcast, the requirements include PAL or NTSC image sizes and frame rates, excellent video quality, low latency is less important, constant bit rate control to manage the channel capacity, and transport over an MPEG Transport Stream. The user application portion may perform specialized preprocessing on the images, such as noise removal or image enhancements.

VSS H.264 Encoder for Stretch Software Development Kit (SDK)
Stretch and VSS partnered to develop an SDK to ease the integration of H.264 encoding into a custom application. The SDK provides a set of 13 APIs to simplify the integration of video encoding into the application. The encoder provides a wide range of user selectable options which adapt the encoding requirements to the user application.

Figure 1 shows the typical usage for integrating H.264 encoding into an application. The three stages are the user application, the encoder and the transport mechanism. The content of the user application is, of course, up to the user to define. The transport mechanism is a packaging of the encoded bitstream into the targeted format, whether that is an RTP transport of Ethernet or an MPEG transport stream. The VSS SDK includes reference code for the transport layer. The H.264 encoder allows the user to configure the encoder to meet the bit-rate and quality within the compute requirements.


Figure 1: SDK usage in an application

Next: Parameters under user control


Parameters under user control
The easiest parameters to define are the image size and frame rate. The image size and frame rate depend on the video source or the target delivery format. Standard definition video (D1 resolution) can also be resized to a smaller image and subsequently reduce the bitrate.



Table 1: Common video image sizes

Table 1 lists common video image sizes; all supported by the SDK. The image sizes are multiples of 16; video compression methods work on the basis of a 16x16 macroblocks. For real-time video encoding, the frame rate determines the amount of time available for the user application, the encoding and the transport layer per frame. The transport layer requires a minimum number of cycles and uses DMA to move the data in the background. The balancing act becomes collecting cycles for the user application from the encoder yet still providing the video quality at the target bitrate.

There are three types of encoded picture types, I, P and B frames. An I frame is encoded without referencing another frame. The P frame may be predicted from references to previous I and P frames. The B frame may be bi-directionally predicted from the previous or the next I, P or B frames. Predicting the image from reference frames lowers the bit rate requirements, i.e. I frames consume most of the bits, followed by P frames, and the fewest bits in the B frames.

The SDK allows the user to define the IDR interval, the number of B frames between P and the number of reference frames used for prediction. The IDR interval specifies the number of frames between I frames, the longer the interval the more P and B frames and lower the bit-rate. But should an I frame be dropped or corrupted, the resulting decoded image would be poor. The B frames between P and the number of reference frames assists in lowering the bit-rate and maintaining the quality, but at a significant increase in computations. For many applications, the typical IDR interval of 15 or 30 frames between I frames works. For medium to high quality images, 2 reference frames are common for standard definition resolutions. The use of B frames requires buffering the video input frames, therefore the latency of the encoder bitstream increases.

Motion Estimation on P and B frames consumes most of the processing cycles. The SDK uses a proprietary, patented search algorithm, but the user can influence the algorithm to meet their needs. For inter-frame prediction, the user configures the encoder for the size of the macroblock subdivisions used in the motion search. For the least number of computations, the user can restrict the searches on 16x16 macroblocks. This impacts both bit rate and the quality. Improvements to the quality can be made by searching the macroblocks on smaller subdivisions, i.e. 8x16, 16x8 and 8x8 sub-divisions. The user also configures the accuracy of the search, e.g.1 PEL , ½ PEL and ¼ PEL searches.

Other parameters, such as quantization, the use of the deblocking filter, and intra-prediction parameters also affect the overall efficiency of the encoder. With all the possible factors, it is impossible to say which combination of parameters will meet the specific requirements. To assist, the SDK ships with a demonstration application that executes on the Stretch S55VDP platform and a PC application that allows the user to make encoder configuration changes and view the resulting compute requirements and the video quality.


Implications of parameter changes
The VSS H.264 Encoder SDK for Stretch provides a mechanism for balancing between processor load and compression efficiency by switching on and off the selected compression tools. As an illustration of the SDK performance tuning capability, the performance/quality chart shown in Figure 2 was obtained by incrementally adding optional tools:

  • In-loop deblocking filter
  • Hierarchical Motion Estimation
  • ½-pel motion vectors
  • ¼-pel motion vectors
  • 4x4 Intra prediction.
The basic configuration uses an IBBPBBP GOP structure, 16x16 Intra prediction, full-pel accuracy Motion Estimation, 1 reference picture, progressive encoding. This basic configuration requires about 60% of the Stretch S5530 computational cycles and provides surveillance quality video.

By adding additional H.264 tools, the encoder will trend toward providing 'broadcast quality' and lower bit-rates by consuming more computational cycles. Figure 2 shows the PSNR numbers for the CCIR15 clip, a low motion clip, with the basic configuration and the effects of adding additional tools.


Figure 2: CCIR15 PSNR Numbers and H.264 Tools

Next: Integrating the API into an application
Integrating the API into an application
With that background, adding an encoding module to the user application is relatively straightforward. Figure 3: Software Listing shows the pseudo code for integrating the encoder into the application. After initializing the system resources and the video acquisition hardware, the user calls the encoder initialization function. This function creates the buffers for the input video frames, encoder symbols, and the final bitstreams.


Figure 3: Software Listing using the SDK Psuedo Code

The encoder application operates on video frames, the application retrieves a complete frame from the buffer pool. This buffer of data can be passed to the user application which operates on the data as desired. Following this, the data is passed to the encoder for encoding. After completing the encoding the encoded bitstream is passed to the transport layer. This process continues until termination. The SDK includes the complete source code for the demonstration application as a template showing all of the specific variable initializations and function calls. The demonstration application also reports encoder and application statistics.

Conclusion
Integrating H.264 video encoders into a custom application requires balancing quality and bitrate requirements against the computational load on the processor. The VSS H.264 encoder SDK enables the developer to easily integrate an encoder with their own custom video processing or other functions. The SDK abstracts the complexity of H.264 encoding away from the user, but provides the user the ability to influence the encoder to meet the needs of the application.

About the author
Joe Hanson is the Director of Business Development at Stretch Inc. He is responsible for developing the eco-system of technology supporting the Stretch software configurable processors in compute intensive applications. Prior to Stretch, he worked at Altera where he held a variety of positions including director level positions in applications and marketing for the Excalibur embedded processors and system level design tools. Joe holds degrees in Biological Sciences and Electrical Engineering, holds three patents, and has over 20 years of design experience involving embedded systems and programmable logic. He can be reached at hanson@stretchinc.com.


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm