This article assumes a basic understanding of video compression algorithms. For an introduction to video coders, see How video compression works.
Announced in the fall of 2008, On2 Technologies' 8th generation video codec, On2 VP8 offers significant gains in compression performance in a bitstream that is less compute intensive to decode than either its predecessor (VP7) or other competing technologies such as H.264 Inheriting many great innovations from its predecessors (VP7 and VP6) such as golden frames, processor-adaptive real-time encoding and a low-complexity loop filter, VP8 adds more than fifty new techniques to achieve its goal of outstanding quality at low bitrates, with very low complexity.
On2 VP8 has been designed with a wide range of machines in mind, from 60 MHz ARM9 processors to today's highly parallel multi-core processors. It encodes in real-time on low-end machines, and takes fewer cycles to decode than other leading algorithms. This article explores some of the innovations that make VP8 work so well.
The Constructed Reference Frame
One of the most exciting innovations in VP8 is the constructed reference frame. A constructed reference frame is a frame of image data (a reference buffer) that's encoded into the bitstream but never displayed. It serves solely to improve the encoding of subsequent frames by providing an additional and hopefully better predictor than any previously transmitted "normal frames."
The creation of a constructed reference frame is not defined by the bitstream. Instead, creating the best possible constructed reference frame is a task left to the encoder. It could, for example, be created by compositing several past and future frames, with appropriate motion compensation and temporal filtering, or it could contain arbitary elements such as graphic overlays, that need to be inserted in some of the subsequent frames.
A New Take on Loop Filtering
Loop filtering is the process of removing blocking artifacts introduced by quantization of DCT coefficients from block transforms. VP8 brings several loop-filtering innovations (some of which have also now been reverse engineered into VP6 and VP7) that speed decoding by not applying any loop filter at all in some situations. VP8 also supports a method of implicit segmentation where it is possible to select different loop filter strengths for different parts of the image, according to the prediction modes or reference frames used to encode each macroblock. For example it would be possible to apply stronger filtering to intra coded blocks and at the same time specify that inter coded blocks that use the Golden Frame as a reference and are coded using a 0,0 motion vector should use a weaker filter. The choice of loop strengths in a variety of situations is fully programmable on a frame by frame basis, so the encoder can adapt the filtering strategy in order to get the best possible results.
The More Cores the Better
On2 VP8 is built with parallelism in mind and can exploit multiple processor cores like no other codec. Cross macroblock-row data dependencies that plague other codecs have been removed in VP8, so the encoder can make efficient use of multiple cores to encode several macro block rows at the same time. In theory, VP8 can make use of as many cores as there are macro-block rows in the image, so it could take advantage of up to 68 cores when encoding 1080p content, all without sacrificing compression efficiency. Indeed, for live encoding, because of VP8's adaptive algorithms, the more cores that are available the better the quality is likely to be.
Even the entropy encode and decode (a bottleneck in other codecs that is not readily amenable to parallelization), can be split across multiple cores by virtue of special VP8 options that allow for several independently coded data partitions.