Design Article
Flexible and Scalable Movie Architecture for DSP based DSC/DM Systems
Vignesh Loganthan
10/5/2005 12:00 AM EDT
![]() |
Texas Instruments Audio and Video/Imaging Series
|
As any other product in the consumer electronic space, the Digital Still Camera (DSC) is evolving and driven mainly by consumer needs. The rapid evolution of DSCs has kept the device manufacturers on their toes with adaptations becoming a major challenge. The devices move from one generation to the next very fast, reducing the design cycle time to less than a year. Also, the codec requirements (both audio and video) are prone to change fast due to innovation and growth in multimedia technology.
DSC systems on the market are not for just capturing high resolution, high quality images. The requirements include a host of other features, including recording and playing back movies, Auto White Balance (AWB), Auto-exposure (AE), resizing, digital zoom, image stitching, image rotation, movie stabilization and so on.
Focusing more on the movie implementation shows the need to support different file formats such as QuickTime, MP4 and others. Supporting different storage media including SecureDigital (SD), HardDisk (HD) and CompactFlash with varied data access speed becomes important.
Camera units (with a CCD or CMOS imager), On-Screen-Display interface, USB, EMIF and other peripherals form an integral part of a DSC system. A robust design with sufficient abstraction can support an easy plug-in of the module drivers. Also, in a complex system, providing flexibility to support different operating systems can be very useful.
Extending the system capability to support features like Digital TV reception complying to DVB-H, DMB and ISDB standards also needs to be addressed considering the dynamic market needs. While these standards have many common elements, they also have significant differences that rule out a hardware solution.
A generic, modular, and portable software framework will always be handy in such a market where an equipment manufacturer must quickly and easily support product requirements. This can be through a mix-match of different operating systems, multimedia codec, file formats, pre/post processing features based on their needs. Figure 1 gives an overview of a DSC system defining different building blocks.
Figure 1: DSC architectural overview |
Figure 2: Typical movie system |
We would need to fit several codecs (both audio and video) into the system with limited performance and memory available. We must provide flexibility to perform the encoding and decoding operations in different processors (ARM/DSP) in the SoC. The audio and video encoded will be synchronized and appropriate header information has to be created depending on different file formats supported. This data has to be appropriately organized and written to the storage medium.
The playback operation reads the movie data from the memory, parses it, decodes the audio and video data, and finally plays back with AV sync. Also, the file write and read operation should consider supporting a spectrum of available storage devices.
The hardware reference system provides flexibility for equipment manufacturers to easily replace different hardware modules and quickly prototype a product.
We developed the software architecture explained in the following sections on these reference systems for a seamless integration.
The complete RSA suite will run on the ARM processor of the DM device. DSP and all other co-processors will act as slaves to the ARM processor, which will schedule tasks to them.
We designed this architecture to abstract most elements that change across camera models and achieves the following objectives:
- Maximum Performance
Achieved with full use of the system resources and concurrent programming. - Low Overhead
The architecture maintains system performance via low overhead. - Stability and Robustness
Possible by clean and structured management of system states. - Easy Function Change
Simplicity in adding new functionalities, removing existing functionalities, and modifying existing functionalities of the system according to manufacturer customization.
Layers of the Architecture
The layers are an abstract concept and the actual implementation of the layer is a module. Each layer removes the upper layers' dependency to the lower layers.
Figure 3: Layers of Remington Software Architecture |
Figure 3 shows the layers of the Remington Software Architecture. The arrows indicate accessibilities of each module. For example, the UI layer can access the OS abstraction layer, the trace layer, and the API layer.
- User Interface Layer
The UI layer implements a user interface. The UI layer is independent of the device driver layer, OS, and kernel layers and the changes in the UI layer have no effects on any other layers because no other layer accesses the UI layer at this level. - Remington Application Interface Layer
The Remington Application Interface (API) provides interface functions for the UI layer. The API layer is independent of the OS and the hardware system. - Kernel Layer
The kernel layer implements the major functionalities of the system. The kernel layer can access the device driver layer, OS abstraction layer and algorithm abstraction layers. Therefore, the kernel layer is independent of the OS, the hardware system, and the algorithms. - Algorithm Abstraction Layer
The algorithm abstraction layer provides the interface to run algorithms. The algorithm abstraction layer can be accessed only by the kernel layer. - Device Driver Layer
The device driver layer provides interfaces to control the hardware systems. The device driver layer can access the OS abstraction layer and the hardware system. Therefore, the device driver layer is independent of the OS but dependent on the hardware system. - Operating System Abstraction Layer
The OS abstraction layer provides interfaces to the OS functionalities. The OS abstraction layer can access the OS. Therefore, the OS abstraction layer is independent of the hardware system. However, it is dependent on the OS. - Trace Layer
The trace layer is used to debug concurrent programming. The trace layer provides tracing message APIs. The display of the output can be controlled by zone and module flags in order that the programmers can see only the messages which they want to see.
The basic design considerations are:
- You can use any video and audio codec for this movie record implementation. However, some video codecs (MPEG-4 and H.264) and audio codec(G.711 and AAC) were featured in our design.
- Audio and video encoding is frame based. The encoding algorithm encodes one frame at a time.
- Audio input through McBSP (Multi channeled Buffered Serial Port) is controlled by the ARM processor.
- Audio/Video (AV) time-stamping (synchronization) is done by the ARM processor and is based on McBSP interrupts.
- You can run the algorithms either on the ARM processor or the DSP. For example, you can run a codec on the DSP while AWB, AE, and other processing can run on the ARM processor.
- Different file formats like QTFF, MP4 are supported. The Audio and Video data are arranged as interleaved data with corresponding File Format headers.
- The control task will support movie start, stop, pause, and resume.
- Implementation should support different FPS (Frames per second), resolution, and camera modules.
Task Assignment
Different stages of the movie record flow involve input data acquisition, buffer switching, data processing and output data storage. The four main tasks are:
- Movie Record Task
Handles input from the user through UI task. It can accept start, stop, pause, and resume command. - Audio Encode Task
Handles audio encode of the captured audio, including any pre- and post-processing required. - Video Encode Task
Handles video encode of the captured frame, including any pre- and post-processing required. - File Write Task
Writes the encoded video and audio as per the file format to the memory card.
Task prioritization is important to this design and must occur in the order listed above.
The audio encode task requires a higher priority as it acts as the master driving the system.
Input Data AcquisitionAudio
PCM audio data is received by the McBSP and is stored in the ARM internal memory (AIM). The data is then transferred using DMA into the external memory and used by the audio encoder. Since the ARM internal memory is small, there is a constraint on the size and number of audio data buffers in AIM. The video frame rate commonly used is 30 FPS, meaning there would be around 33 ms delay between each video frame. Since the McBSP interrupt is used for time-stamping for both audio and video, it should have a high enough resolution.
The McBSP interrupts are generated every 8 ms giving the required time resolution. The time resolution is calculated based on the audio sampling frequency. This is because the PCM buffer size will depend upon the frame size required for audio. Each PCM buffer should be an integral divisor of the audio frame size. The relation is given below:
PCM buffer size = Audio frame size / N,
where N = 1,2,3...
The value of N is chosen such that one PCM audio frame should be filled in around 8ms time. Thus, N and frame size will depend on the audio encoding algorithm. The PCM buffer size is determined based on all these conditions. The time taken for filling in the PCM buffer is the basic time unit of the system. There should be at least two PCM buffers.
Time resolution = PCM buffer size / Audio sampling rate
Also
Time resolution = Audio Frame Size / (N * audio sampling rate)
An interrupt is generated by the McBSP after filling one buffer. The PCM audio data should be moved to external memory for the audio encoding. This is required since audio codec running on DSP accesses external memory for the data.
Multiple audio input buffers are used for taking care of occasional performance issues. The McBSP interrupt service routine (ISR) will kick start a DMA for transferring data from PCM buffer to audio input buffer. The ISR also keeps a count of the number of ISRs that have occurred. This ISR also notifies the audio encode task that an audio input buffer is available for encoding using a semaphore / flag.
The timestamp is the number of McBSP interrupts multiplied by the time resolution. Notice that the timestamp for the audio is based on the audio input buffer position also. N PCM buffers are required for filling in one frame of audio input buffer. Thus the timestamp for buffer i is:
((n*r)+i) * N * time resolution
Where n is the number audio input buffers and r is the number of times an audio buffer has been used. In other words,
Timestamp = Time resolution * number of McBSP interrupts
Figure 4 depicts the buffering mechanism. In this case N = 2. If the frame size is 128 samples, the sampling rate is 8 KHz and the time resolution = 8 ms.
Figure 4: Audio buffering in movie record |
Input Data AcquisitionVideo
The CCD sensor captures the video data and the imaging front-end blocks (CCDC and preview) process the data and store it in external memory. Multiple video input buffers are used to take care of occasional performance issues. The size of the video input buffer depends upon the frame size selected. The system supports different frame resolutions from QVGA to 720P.
Video Input buffer switching logic takes care of advancing the video frame captured to the next buffer. In this case, a video frame needs to be dropped, the video frame is captured into the same buffer. Video buffer switching is discussed in detail below.
A VD interrupt is generated (by the SoC) at a programmed interval after the frame capture is completed. The VD ISR sends a message to the video encode task that a video frame capture is completed using a semaphore. The frequency of VD ISR depends upon the video frame rate. Address of the video input buffer where the next frame should be captured is set by the VD ISR. It also checks to see if there any buffers that need to be encoded and sets the flag accordingly.
The timestamp used for video data is the capture time timestamp. The timestamp will be calculated based on the same formula as the one used for audio data acquisition:
Timestamp = Time resolution * number of McBSP interrupts
Associated with the buffers would be an array storing each buffer's timestamp. VD ISR puts the timestamp for each of the video input buffer. Video encode task and file write task will use this information for encoding and writing the video chunk of file format.
Initially all video input buffers except the last one are filled with video data before the encoding process is started. This is to ensure that sufficient data is available for encoding.
Data Processing
Audio and Video data processing happens in parallel. They will be carried by audio encode and video encode task respectively.
Audio Data Processing
The audio task waits for a semaphore from the McBSP ISR. After receiving this it checks whether an audio frame is available. If an audio frame is not available, it will wait for a semaphore from the McBSP ISR. This is required since the PCM buffers may not be of the same size as the audio frame. Another condition for audio frame encode is that the audio output buffer is available for writing. Once a frame is available for encode and a buffer is available for writing encoded audio data, it will issue a command to encode the frame.
Multiple audio output buffers are available to enable simultaneous file write with audio encode. A minimum of three buffers are required to take care of occasional slow performance. The audio data task has to take care of keeping the timestamps in proper buffers and pass it on to the file write task which will write the audio chunk as per the file format.
Video Data Processing
The video task waits for a semaphore from the VD ISR. Once the semaphore is received the task will check if a video output buffer is available. If video output buffer is available, it will send a command to start video encoding of a frame. If not it will wait for a video output buffer to be available.
Multiple video output buffers should be available to enable file write happening simultaneously with video encode. A minimum of three buffers are required to take care of occasional slow performance. The video data task has to take care of keeping the timestamps in proper structures and pass it on to the file write task which will write the video chunk as per the file format.
Output Data Storage
The encoded audio and video data is written to the memory card as per the file format. Multiple audio and video buffers are used between audio/video encode task and file write task. There will be only one file write task, since it will be writing audio and video data to the same file. These tasks are synchronized using semaphores.
Movie record stops when there is no memory available in media. Before staring the movie, the available disk space is verified. Now when movie recording is in progress, generated audio and video data size is kept track off and subtracted from the available size. Constant header size for file format and worst case variable header size required for audio and video data generated are also included for this calculation. When available size in media crosses a threshold movie encode is stopped.
File Format
The QuickTime and MP4 Movie File format are used in our movie implementation. Both these file formats support interleaved audio and video data to be written. Audio is written as audio chunks which will have information regarding timestamp and duration in addition to the encoded audio data. Video is written as video chunks which will have information regarding timestamp and duration in addition to the encoded video data. These file formats supports different Audio and Video codecs making the system flexible.
Audio and Video Data Storage
File write task synchronizes with the audio/video encode task regarding the availability of encoded audio/video data to be written into the file. The audio/video encode tasks also provide information regarding the timestamp for the first frame of data, duration of each chunk, and so on. The file write task uses all these information and encoded data to create audio and video chunks and write into the output file. Messages are used for synchronizing this. Once the file write task completes the file write of the data, it would release the audio and video output buffers to the audio task and video task respectively. This synchronization is done through semaphores.
The video data given should not be too high that the audio file write task has to wait for a long time to write. Also it should not be too small so that the performance of the file system suffers.
Figure 5: System design for movie record |
Figure 5 captures the overall system design of the movie recording. In addition to the overall data flow there are a few additional design considerations explained below.
Audio Video Synchronization
In order to achieve audio/video synchronization we must ensure that video frames are encoded at proper times in sync with audio. To do this we should use the same clock source for audio and video to determine time at which video frames have to be encoded. Audio is encoded continuously till recording ends without any break. Video frame should be encoded at proper interval depending on video frame rate. We use McBSP interrupt count as the timer source for both audio and video. The generation of timestamps has been explained in previous sections. Audio and video timestamps are calculated by the audio and video encode task respectively. The file write task adds the timestamp information to the audio and video chunks as required by the file format. Multiple frames could be part of the same audio or video chunk, the timestamp used will correspond to the first frame.
Pre-processing and Post-processing
Movie stabilization is handled by the video encode task. This would be handled as pre-processing before the actual movie encode. There are two different methods used in movie stabilization. In the first case the frame captured is bigger than that required. The extra size requirement depends on the algorithm used and the movie stabilization algorithm outputs the required size video. In the second case the same size video is captured as required. The movie stabilization algorithm gives a reduced size output, this is them zoomed to the required size before encoding. Both scenarios will be handled by the video encode task.
The basic design consideration involved include:
- Both audio and video playback is supported
- Movie play, stop, pause, resume, rewind and fast forward are supported
- Audio and video decoding is frame based. Decoding algorithm decodes one frame at a time
- Same codec and file formats as used in the movie record are used in playback
- Audio and video output is handled by the ARM processor making it easier to handle the AV sync
- A host of post-processing for audio and video are supported.
Task Assignment
The tasking for movie playback is similar and complementary to the once used in movie record. Figure 6 shows the movie playback task deployment.
- Movie Decoder Task
Handles input from the user through UI task. It can accept start, stop, pause and resume command. - Data Read Task
Manages movie file and runs file and runs file parsing algorithm. - Video Decode Task
Manages input and output buffer of video decoding. It determines when video decoding should run and what data should be consumed and which output block should be used. - Audio decode Task
Manages input and output audio decoding buffers. It determines when audio decoding should run and what data should be consumed by PCM buffer. - Video Resize Task
Resizes the decoded output so that it can be used for display. - Video Display Task
Displays the resized video data using OSD. Manages the OSD interrupt and Video Encoder (VENC). It also runs the AV sync module.
Figure 6: Movie playback task deployment |
Task prioritization in the case of movie playback is as follows:
Movie Decoder > Audio Decode > Video Decode > Video Display > Video Resize > File Read
Buffer Control
A generic file format parser for QuickTime/MP4 to support different codecs and formats is implemented. The data handling and buffering is explained in the sections below.
Video Data Buffering
Compressed video data extracted by the file parser from the movie file is read by the data read task to fill a compressed video data buffer. When the data read task fills this with sufficient data, the video decode task starts to consume this data to perform the decoding.
The video decode tasks fills in the uncompressed video data buffer. This is later consumed by either the resize task or the video output task.
Figure 7 shows the overall data flow block diagram for the movie playback operation.
Figure 7: Movie playback data flow |
Video decode task is suspended automatically when the uncompressed video data buffer is full and the task is waiting for an empty block. The suspended task is woken up automatically when an empty block is available. The resize task or video output task is suspended automatically when no data is available in the uncompressed video data buffer and the task is waiting for new data. The task is also woken up automatically when new data arrives.
The resized video data buffer contains video output from the resizer peripheral. The access unit is a resize frame size.
Audio Data Buffering
The compressed audio data extracted by the file parser from the movie file and is read by the data read task to fill the compressed audio data buffer. The decoding of this data is done by the audio decode task. The audio decode task has a similar main loop to the video decode task.
The PCM buffer contains the uncompressed audio from audio decoder. Data in this buffer is consumed by the McBSP device driver, which is configured appropriately for different audio output configurations.
Audio/Video Synchronization
In order to achieve Audio/Video synchronization we ensure video frames are decoded at proper times in sync with audio. The best way to do this is using the same clock source for audio and video to determine the time at which video frames have to be decoded.
The video and audio timestamps, which come along with the movie file (stored by the movie record implementation) are used for syncing AV. Audio time is calculated based on audio timestamps (if available in the movie file) or the McBSP interrupts. Video timestamps are fetched at that instance and a decision to display the current frame is the made depending on whether it matches the audio time calculated.
If the system performance is not good enough to decode all video data, then video display task skips video frames appropriately.
- Additional features and enhancements in movie implementation requires change only in the Kernel layer.
- The audio and video codec are implemented as individual AABs. Different customized codec can be developed as AAB and all can finally be integrated into the overall system without any change in the Kernel layer implementation.
- Algorithm can be run either on ARM or DSP. Individual AAB can be implemented in any of the processor and they remain abstracted from the Kernel layer. This gives the flexibility to use the full system performance.
- The OS abstraction layer can contain any OS and Kernel can issue commands without any knowledge of the underlying OS.
- The File formats like QTFF, MP4 can have their Creator and Parser as individual AABs.
- Motion stabilization, Equalizer control, volume control and similar pre/post processing can be implemented as AABs.
- A host of hardware modules like Camera unit, memory cards can be supported by providing corresponding drivers in the driver layer. These remain abstracted from the Kernel layer.
- The RAPI and UI layer can seamlessly integrate with the Kernel layer issuing commands for various functionalities. This makes the support for different FPS (Frames per second), resolution and camera modules very easy.
- The architecture can also be easily extended to future needs and requirements. It can interface with network stacks (TCP/IP) seamlessly for playing movie streams that are transmitted thorough IP data casting over DVB-H.




