Design Article
DSP video processing without DSP programming, via open source APIs
Rishi Bhattacharya
Systems and Software Architect,
Texas Instruments
11/3/2006 2:05 AM EST
Open-source multimedia frameworks, which typically run under the Linux operating system on the GPP, are an ideal target for these APIs. The computational burden of video codecs can be offloaded by leveraging the APIs, which abstract many of the complexities of DSP programming. This approach only requires programmers to have basic knowledge of the DSP, and eliminates the need to write code to stitch together DSP functions with those that run on the GPP. These advantages, plus the ability to utilize the many capabilities offered by free open source plug-ins and frameworks, can substantially reduce time to market for new video products.
Codec hardware alternatives
Developers have several alternatives in selecting hardware platforms to run the codec algorithms that compress a digital stream for transmission or storage and decompress it for viewing and editing. ASICs offer high performance and low power consumption in digital video applications because the hardware is designed specifically for the applications. The disadvantage of an ASIC is that non-recurring expenses are high and it can also be very expensive to implement changes, such as to accommodate evolving codec standards. GPP cores, on the other, have a comparatively low NRE and can be fairly easily re-programmed to address change, but their performance is low for digital video because they are relatively inefficient at performing computationally intensive signal processing applications. For example, GPPs accomplish multiplication by a series of shift and add operations that each take one or more clock cycles.
DSPs have the potential to provide the best of both worlds. In contrast to a GPP, a DSP is optimized for a computational intensive signal processing of the type found in digital video applications. DSPs have single cycle multipliers or multiply accumulate units that can speed up codec execution. Higher performance DSPs have several independent execution units that can operate in parallel, enabling them to carry out several operations per instruction. Yet, the DSP also provides full software programmability including field reprogramming capabilities. This enables a user to, for example, roll out an MPEG 2 product and later upgrade to H.264 video codec. The primary limitation of DSPs in digital video applications is that they are typically programmed using proprietary languages, and programmers who are familiar with DSPs are much less common than those who are familiar with popular GPP architectures.

Figure 1: Overview of the pros and cons of using different architectures for real-time video systems design.
Integration challenges
Developers of digital video systems also face integration challenges. Digital video systems are made up of multiple encoders, decoders, codecs, algorithms and other software components, which must all be integrated together into an executable image long before any content can run on the system. Stitching all these elements together and making sure they function cohesively can be a difficult task. Some systems will require distinct video, imaging, speech, audio and other multimedia modules. Manually integrating each software module or algorithm distracts developers from working on value-added functionality, such as adding innovative features.
Many digital video developers have turned to the open source way of building software. A common approach is to obtain significant parts of the software from open source and leverage in-house expertise in the areas of usability and hardware integration. Developers often participate in open source projects to develop technology to fulfill specific needs and integrate the open source code with internally developed code to create a product.

Figure 2: Benefits of designing with open source Linux
Next: New API addresses these issues, GStreamer filters
New API addresses these issues
Addressing all of these issues, Texas Instruments (TI) has developed an application programming interface (API) that allows DSPs to be leveraged from open-source multimedia frameworks, such as GStreamer. The API enables multimedia programmers to leverage the DSP Codec Engine from within a familiar environment.
The interface frees digital video programmers from dealing with the complexity of programming DSPs, making it easy for the ARM/Linux developers to exploit the power of DSP codec acceleration without requiring knowledge of the hardware. The interface also automatically and efficiently partitions work between the ARM and DSP. This eliminates the need to write code to interface between functions that run on the DSP and those that run on GPP cores. The interface has been developed in the form of a GStreamer plug-in that was developed by TI in accordance with open source community standards.
GStreamer is a media processing library that provides an abstract model of a transformation that is based on a pipeline metaphor where media flows in a defined direction from input to output. It has gained wide popularity in the digital video programming community through its ability to abstract the manipulation of different media in a way that simplifies the programming process.
GStreamer makes it possible to write a general video or music player that can support many different formats and networks. Most operations are performed, not by the GStreamer core, but rather by plug-ins. GStreamer base functionality is primarily concerned with registering and loading plugs-in and providing base classes that define the fundamental capabilities of classes.

View full size
Figure 3: Multimedia framework responsibilities and data flow in a decode-only example.
GStreamer filters
Source filters present the raw multimedia data for processing. They may get it from a file on a hard disk [such as the File Source source filter], or from a CD or DVD drive, or they may get it from a "live" source such as a television receiver card or a network. Some source filters simply pass on the raw data to a parser or splitter filter, while other source filters also perform the parsing step themselves.
Transform filters accept either raw or partially processed data and process it further before passing it on. There are many types of transform filters including parsers that split raw byte streams into samples or frames, compressors and decompressors, and format converters. Renderer filters generally accept fully processed data and play it on the system's monitor or through the speakers or possibly through some external device. Also included in this category are "file-writer" filters that save data to disk or other persistent storage, and network transmission filters.
Data processing takes place in the plugin_chain() or plugin_loop() function. This function could be as simple as a scaling element or as complicated as an actual MP3 decoding functionality. After data is processed, it is sent out from the source pad of the GStreamer element using a gst_pad_push() function. This pushes data to the next element in the linked pipeline.
Next: GStreamer buffers, Virtual to physical addressing
GStreamer buffers
Buffers are the basic unit of data transfer in GStreamer. The GstBuffer type provides all the state necessary to define a region of memory as part of a stream. Representation of data within Gstreamer via GstBuffer structures follows the approach taken by several other OSes and their respective multimedia frameworks (e.g the Media Sample concept in Microsoft DirectShow). Sub-buffers are also supported, allowing a smaller region of a buffer to become its own buffer, with mechanisms in place to ensure that neither memory space goes away prematurely.

Figure 4: Representation of data within Gstreamer via GstBuffer structures follows the approach taken by several other OSes and their respective multimedia frameworks.
Buffers are usually created with gst_buffer_new(). After a buffer has been created one will typically allocate memory for it and set the size of the buffer data. The following example creates a buffer that can hold a given video frame with a given width, height and bits per pixel.

Figure 5: How to create a buffer to hold a video frame of a given width, height, and bits per pixel.
Next: Virtual to physical addressing
Virtual to physical addressing
The MMU of the ARM926 on the DM644x devices based on DaVinci technology allows for virtual/physical addressing capabilities. However, the C64x+ DSP core deals only with physical addresses. Therefore, input and output buffers provided to the DSP for processing must reside in physically contiguous memory.
Virtual to physical address translations are handled by the Codec Engine. Physically contiguous memory can be obtained by reusing (pointers to) some driver allocated buffers, which use techniques available in Linux like dma_alloc_coherent() to allocate this type of memory in *kernel* space. CMEM, a library/kernel module developed by TI, allows for allocation of physically contiguous memory from *user* space applications.
Consider the case where we allocate the physically contiguous *output* buffers using the aforementioned CMEM driver. The Codec Engine would decode the frame and put the decoded frame in the output buffer.
The pointer to the output buffer would then be passed to the fbvideosink (thru the GstBuffer). The videosink would have to memcpy the decoded data into the frame buffer memory, before it can be displayed. Since memcpy operations are expensive on the GPP, this method would load the ARM and the DDR interface heavily, increase power consumption, and be extremely inefficient.
This technique may be feasible for very small (e.g. QCIF 176x144, etc) buffers, but will start to degrade system performance heavily when using D1 (and higher) size buffers. A more efficient approach would be to re-use the physically contiguous buffers that are already allocated by the drivers, and pass pointers to these buffers back and forth between the Codec Engine and videosink plugins.
Fortunately, GStreamer provides the following API to facilitate this type of interaction:

Figure 6: An efficient approach for re-using physically contiguous buffers that are already allocated by the drivers.
This API is an alternative to using gst_buffer_new() to create a new buffer. A call to gst_pad_alloc_buffer () is done when the element knows which source pad it is going to push the data on. This allows the peer element to provide special "hardware" buffers for the calling element to work on, thus reducing the number of memcpy's required in the system.
The video decoder plugin (transform filter which leverages the DSP via Codec Engine APIs) will use the buffer obtained from the video renderer filter as the output buffer for the video decoder and carry out the decoding. Once the decoding is complete, the output buffer will be pushed (i.e. the pointer shall be passed) to the video renderer plugin. Since the decoded image is already present in the video driver memory, no memcpy will be needed to be done, and the video rendering filter will only have to switch the current display buffer to this specific buffer when the frame has to be displayed.
Next: AV sync processing
AV sync processing
Audio/Video (AV) synchronization processing during playback generally requires three different types of decisions.
- The decision to repeat frame. This is typically made when presentation time of the frame from the stream is greater than a frame interval of the time to display.
- The decision to display a frame. This is typically made when presentation time of the frame from the stream is between a minimum and maximum threshold.
- The decision to skip a frame. This is typically done when the presentation time of the frame is at least two frames (or more) behind the time to display, then the current frame is skipped and the next one is processed in hopes of catching up on the next frame interval. This continues until either the next frame is displayed or there are no more frames left to compare.
Interface developed as plug-in
TI developed a GStreamer transform filter plug-in, which leverages the DSP for video decoding and runs on the ARM under the Linux operating system. TI also provides Linux peripheral drivers, which conform to standard open source mechanisms in terms of driver interface (e.g. V4L2, FBDev, OSS, etc), as well as a Codec Engine API that abstracts many of the complexities of programming DSPs. The APIs themselves, which are provided by the hardware manufacturer, are already optimized for the hardware implementation and if a switch to new hardware is made a new driver can be substituted without changing the application code. This approach can substantially reduce the cost and lead-time of video development.
The computational resources of the hardware devices are implemented in an optimal fashion without any assembly programming. This includes complex operations such as optimized utilization of DSP resources and hardware based acceleration engines, use of enhanced direct memory access peripherals in chained mode for more efficient data transfers, and packet processing in interrupt vs. tasklet modes to flexibly meet different application requirements.
Since GStreamer is a very popular and well-known framework that has become a standard in digital video development, the ability to access the capabilities of the DSP from within this environment eliminates the need for programmers to learn the proprietary DSP programming language. This approach also makes it easy to integrate the capabilities of the DSP with other requirements of the application that are typically performed on a GPP core. Decoding and encoding can be combined with other operations that are required in a digital video application by using other GStreamer plug-ins. The multimedia framework handles the integration task by stitching together the various operations that would otherwise require hand coding.
In conclusion, the new interfaces make it possible to use the GStreamer Linux multimedia framework to leverage the software infrastructure of TI's DaVinci platform of processors. This combined infrastructure provides a flexible framework that can accommodate new generations of multimedia codecs. The software infrastructure enables design of a wide variety of video products. Leveraging this open-source framework provides video equipment designers with access to community supported, robust infrastructure, which can decrease time to market.
About the author
Rishi Bhattacharya has been a Systems/Software Architect in the TI DSP Systems unit for the past 5 years. He is responsible for DaVinci technology processor related systems and software solutions development, and has previously worked extensively on systems solutions for the OMAP family of processors. His main areas of expertise are High-level Operating Systems, Multimedia Frameworks and Inter-processor communication. He graduated with his BSEE from University of Houston in 2000, and he is currently working on his MBA and Master's of Engineering from University of Texas at Austin. He can be reached at rishi@ti.com.



