# Massively parallel processing arrays (MPPAs) for embedded HD video and imaging (Part 2)

In the previous article, "Massively Parallel Processing Arrays (MPPAs) for Embedded HD Video and Imaging - Part 1", we provided an overview of the various types of architectures used to meet the needs of high-performance embedded applications. This analysis brought us to discuss in more detail Massively Parallel Processing Arrays (MPPAs), a type of programmable architecture with computational power similar to FPGAs and superior to high-end DSPs, and a programming model that is appealing to software developers. In that context, we introduced the Ambric Am2045 MPPA architecture and its Structural Object Programming Model (SOPM).

**Target application example: JPEG codec**

This second article illustrates how to apply this programming model to successfully implement embedded applications by using a JPEG codec and discussing its implementation on the Am2045.

**Relevance of JPEG**

JPEG is a standard for image compression. Or rather, JPEG is *the* standard for image compression today. While the video compression space is cluttered with a sea of codecs like MJPEG, MPEG-2, MPEG-4, H.264, AVC Pro, Xvid, WMV -- just to name a few --
JPEG seems to dominate the image encoding space.

The JPEG compression format is also found in the video space in the MJPEG video codec, which is essentially a sequence of JPEG images put together.

Finally, almost all video compression standards share many functional blocks with JPEG. Most video codecs operate on small blocks of pixels (8x8 or similar sizes) on which they perform similar operations such as transformation into the frequency domain (DCT or similar algorithms), quantization, run-length encoding, and so on.

JPEG is a representative and realistic example of a high-performance embedded application, and JPEG is simpler to describe than many other high-end applications, making it practical for the context of this paper. The process for implementing JPEG on the Am2045 is identical to what is required for developing other high-end applications, such as the existing Am2045 GT Video Reference Platform that can be used for H.264, MPEG2, JPEG2000 and many other powerful codecs.

**Algorithm overview**

In the remainder of this article, we use the term JPEG to refer to the most common JPEG mode: baseline JPEG.

JPEG is a lossy image compression codec. The encoder transforms an image into a compressed bit stream using the following operations:

- Each image is converted to a chroma-luma color space where the chroma pixels can be downloaded to provide a first opportunity for compression.
- The image is divided into 8x8 blocks of luminance (Y) and chrominance pixels, which are transformed into the frequency domain using a Discrete Cosine Transform (DCT). This transformation decorrelates the image, concentrating most of the energy into a few (low-frequency) coefficients.
- Each DCT-transformed block is then multiplied by a quantization matrix, which provides a "knob" to adjust the level of compression: Increasing the quantization deteriorates the image quality but increases the compression ratio.
- Each quantized coefficient is then read in a zigzag order, a pattern that orders the coefficients from low to high frequencies. This introduces large groups of consecutive zeros toward the end of each block.
- The coefficients in zigzag order are then run-length encoded, a process that encodes each non-zero coefficient as a pair where the first number represents the number of preceding zeros and the second number represents the value of the non-zero coefficient. This step allows us to compress long runs of zeros.
- Finally, the run-length codes are entropy coded using Huffman tables. These tables allocate fewer bits to the most common codes, thus reducing the average size needed to represent each code.
- The resulting string of Huffman code is packed into a JPEG file following a header containing the information needed to decode the file, such as the quantization and Huffman tables used for encoding each color component.

An additional compression step consists of encoding the first coefficient of each block (the DC coefficient) as a subtraction from the DC coefficient of the preceding block, thus taking advantage of the similarities between neighboring blocks.

The JPEG decoder simply applies the transformations described above in reverse order: the decoder first extracts and decodes the Huffman codes from the packed bit stream, then rearranges the data from a zig-zag order to a block order, performs an inverse quantization, and finally an inverse DCT (IDCT).

*Next: Design development*