Design Article
DSP video processing via open-sourceAPIs
Rishi Bhattacharya
10/30/2006 9:00 AM EST
There are many types of transform filters, including parsers, that split raw byte streams into samples or frames, compressors and decompressors, and format converters. Renderer filters generally accept fully processed data and play it on the system's monitor or through the speakers or possibly through some external device. Also included in this category are "file writer" filters that save data to disk or other persistent storage, and network transmission filters.
Data processing takes place in the plug-in_chain() or plug-in_loop() function. This function could be as simple as a scaling element or as complicated as an actual MP3 decoder. After data is processed, it is sent out from the source pad of the GStreamer element using a gst_pad_push() function. This pushes data to the next element in the linked pipeline.
GStreamer buffers
Buffers are the basic unit of data transfer in GStreamer. The GstBuffer type provides all the state necessary to define a region of memory as part of a stream. Representation of data within Gstreamer via GstBuffer structures follows the approach taken by several other operating systems and their respective multimedia frameworks (e.g., the media sample concept in Microsoft DirectShow). Subbuffers are also supported, allowing a smaller region of a buffer to become its own buffer, with mechanisms in place to ensure that neither memory space goes away prematurely.
Buffers are usually created with gst_buffer_new(). After a buffer has been created, one will typically allocate memory for it and set the size of the buffer data. The following example creates a buffer that can hold a given video frame with a given width, height and bits per pixel.
Addressing
The MMU of the ARM926 on the DM644x devices based on DaVinci technology allows for virtual/physical addressing capabilities. However, the C64x+DSP core deals only with physical addresses. Therefore, input and output buffers provided to the DSP for processing must reside in physically contiguous memory.
Virtual-to-physical-address translations are handled by the codec engine. Physically contiguous memory can be obtained by reusing (pointers to) some driver-allocated buffers, which use techniques available in Linux like dma_alloc_coherent() to allocate this type of memory in kernel space. CMEM, a library/kernel module developed by TI, allows for allocation of physically contiguous memory from user space applications.
Consider the case where we allocate the physically contiguous "output" buffers using the aforementioned CMEM driver. The codec engine would decode the frame and put the decoded frame in the output buffer.
The pointer to the output buffer would then be passed to the fbvideosink (through the GstBuffer). The videosink would have to memcpy the decoded data into the frame buffer memory before it can be displayed. Since memcpy operations are an expensive use of the GPP, this method would load the ARM and the DDR interface heavily, increase power consumption and be extremely inefficient.
This technique may be feasible for very small buffers, but will start to degrade system performance heavily when the developer uses D1 (and higher) size buffers. A more efficient approach would be to reuse the physically contiguous buffers that are already allocated by the drivers, and pass pointers to these buffers back and forth between the codec engine and videosink plug-ins.
Fortunately, GStreamer provides an API to facilitate this type of interaction.
This API is an alternative to using gst_buffer_new() to create a new buffer. A call to gst_pad_alloc_buffer() is done when the element knows on which source pad it is going to push the data. This allows the peer element to provide special "hardware" buffers for the calling element to work on, thus reducing the number of memcpys required in the system.
The video decoder plug-in (transform filter that leverages the DSP via codec engine APIs) will use the buffer obtained from the video renderer filter as the output buffer for the video decoder and carry out the decoding. Once the decoding is complete, the output buffer will be pushed (i.e., the pointer shall be passed) to the video renderer plug-in. Since the decoded image is already present in the video driver memory, no memcpy will be necessary, and the video-rendering filter will only have to switch the current display buffer to this specific buffer when the frame has to be displayed.
AV sync processing
Audio/video (AV) synchronization processing during playback generally requires three types of decisions.
• The decision to repeat a frame. This step is typically taken when the presentation time of the frame from the stream is greater than a frame interval of the time to display.
• The decision to display a frame. This is typically made when the presentation time of the frame from the stream is between a minimum and maximum threshold.
• The decision to skip a frame. This is typically done when the presentation time of the frame is at least two frames behind the time to display. Then the current frame is skipped and the next one is processed in hopes of catching up on the next frame interval. This continues until either the next frame is displayed or there are no more frames left to compare.
Furthermore, a common clock should be used by all elements in the pipeline to facilitate these activities. Fortunately, all of these decisions are made by the audio and video base sink classes within the GStreamer core libraries. Thus many of the complexities of AV synchronization are abstracted away from the user.
Interface developed as plug-in
TI developed a GStreamer transform filter plug-in, which leverages the DSP for video decoding and runs on the ARM under the Linux operating system. TI also provides Linux peripheral drivers, which conform to standard open-source mechanisms in terms of driver interface, as well as a codec engine API that abstracts many of the complexities of programming DSPs. The APIs themselves, which are provided by the hardware manufacturer, are already optimized for the hardware implementation. If a switch to new hardware is made, a new driver can be substituted without changing the application code. This approach can substantially reduce the cost and lead time of video development.
The computational resources of the hardware devices are implemented in an optimal fashion without any assembly programming. This includes complex operations such as optimized utilization of DSP resources and hardware-based acceleration engines; use of enhanced direct memory access peripherals in chained mode for more efficient data transfers; and packet processing in interrupt vs. tasklet modes to flexibly meet different application requirements.
Since GStreamer is a very popular and well-known framework, which has become a standard in digital video development, the ability to access the capabilities of the DSP from within this environment saves programmers from the need to learn the proprietary DSP programming language.
This approach also makes it easy to integrate the capabilities of the DSP with other requirements of the application that are typically performed on a GPP core. Decoding and encoding can be combined with other operations that are required in a digital video application by using other GStreamer plug-ins. The multimedia framework handles the integration task by stitching together the various operations that would otherwise require hand coding.
In conclusion, the new interfaces make it possible to use the GStreamer Linux multimedia framework to leverage the software infrastructure of TI's DaVinci platform of processors. This combined infrastructure provides a flexible framework that can accommodate new generations of multimedia codecs.
The software infrastructure enables design of a wide variety of video products. Leveraging this open-source framework provides video equipment designers with access to a community-supported, robust infrastructure, which can decrease time-to-market. n
-Rishi Bhattacharya is systems and software architect for Texas Instruments

