[Part 2 discusses numeric formats as they relate to audio, with a focus on dynamic range, precision, and SNR. For the accompanying video tutorial, see Fundamentals of embedded video.]
Audio Processing Methods
Getting data to the processor's core
There are a number of ways to get audio data into the processor's core. For example, a foreground program can poll a serial port for new data, but this type of transfer is uncommon in embedded media processors because it makes inefficient use of the core.
Instead, a processor connected to an audio codec usually uses a DMA engine to transfer the data from the codec link (like a serial port) to some memory space available to the processor. This transfer of data occurs in the background without the core's intervention. The only overhead is in setting up the DMA sequence and handling the interrupts once the buffer of data has been received or transmitted.
Block processing versus sample processing
Sample processing and block processing are two approaches for dealing with digital audio data. In the sample-based method, the processor crunches the data as soon as it's available. Here, the processing function incurs overhead during each sample period. Many filters (like FIR and IIR, described later) are implemented this way because the effective latency is lower for sample-based processing than for block processing.
In block processing is a buffer of a specific length must be filled before passing the data to the processing function. Some filters are implemented using block processing because it is more efficient than sample processing. For one, the processing function does not need to be called for each sample, greatly reducing overhead. Also, many embedded processors contain multiple processing units such as multipliers or full ALUs that can crunch blocks of data in parallel. . What's more, some algorithms are, by nature, must be processed in blocks. A well known one is the Fourier Transform (and its practical counterpart, the Fast Fourier Transform, or FFT), which accepts blocks of temporal or spatial data and converts them into frequency domain representations.
In a block-based processing system that uses DMA to transfer data to and from the processor core, a "double buffer" must exist to arbitrate between the DMA transfers and the core. This is done so that the processor core and the core-independent DMA engine do not access the same data at the same time and cause a data coherency problem.
For example, to facilitate the processing of a buffer of length N, simply create a buffer of length 2-N. For a bi-directional system, two buffers of length 2-N must be created. As shown in Figure 1a, the core processes the in1 buffer and stores the result in the out1 buffer, while the DMA engine is filling in0 and transmitting the data from out0. It can be seen in Figure 1b that once the DMA engine is done with the left half of the double buffers, it starts transferring data into in1 and out of out1, while the core processes data from in0 and into out0. This configuration is sometimes called "ping-pong buffering," because the core alternates between processing the left and right halves of the double buffers.
Note that in real-time systems, the serial port DMA (or another peripheral's DMA tied to the audio sampling rate) dictates the timing budget. For this reason, the block processing algorithm must be optimized in such a way that its execution time is less than or equal to the time it takes the DMA to transfer data to/from one half of a double-buffer.
(Click to enlarge)
Figure 1. Double-buffering scheme for stream processing.
Two-dimensional (2D) DMA
When data is transferred across a digital link like I2S, it may contain several channels. These may all be multiplexed onto one data line going into the same serial port. In such a case, 2D DMA can be used to de-interleave the data so that each channel is linearly arranged in memory. Take a look at Figure 2 for a graphical depiction of this arrangement, where samples from the left and right channels are de-multiplexed into two separate blocks. This automatic data arrangement is extremely valuable for those systems that employ block processing.
(Click to enlarge)
Figure 2. A 2D DMA engine used to de-interleave (a) I²S stereo data into (b) separate left and right buffers.