For related product news, see Stretch adds support for H.264 SVC
Codecs are used to compress video to reduce the bandwidth required to transport streams, or to reduce the storage space required to archive them. The price for this compression is increased computational requirements: The higher the compression ratio, the more computational power is required.
Fixing the tradeoff between bandwidth and computational requirements has the effect of defining both the minimum channel bandwidth required to carry the encoded stream and the minimum specification of the decoding device. In traditional video systems such as broadcast television, the minimum specification of a decoder (in this case a set-top box) is readily defined.
Today, however, video is used in increasingly diverse applications with a correspondingly diverse set of client devices—from computers viewing Internet video to portable digital assistants (PDAs) and even the humble cell phone. The video streams for these devices are necessarily different.
To be made more compatible with a specific viewing device and channel bandwidth, the video stream must be encoded many times with different settings. Each combination of settings must yield a stream that targets the bandwidth of the channel carrying the stream to the consumer as well as the decode capability of the viewing device. If the original uncompressed stream is unavailable, the encoded stream must first be decoded and then re-encoded with the new settings. This quickly becomes prohibitively expensive.
In an ideal scenario, the video would be encoded only once with a high efficiency codec. The resulting stream would, when decoded, yield the full resolution video. Furthermore, in this ideal scenario, if a lower resolution or bandwidth stream was needed to reach further into the network to target a lower performance device, a small portion of the encoded stream would be sent without any additional processing. This smaller stream would be easier to decode and yield lower resolution video. In this way, the encoded video stream could adapt itself to both the bandwidth of the channel it was required to travel through and to the capabilities of the target device. These are exactly the qualities of a scalable video codec.
The Scalable Video Codec extension to the H.264 standard (H.264 SVC) is designed to deliver the benefits described in the preceding ideal scenario. It is based on the H.264 Advanced Video Codec standard (H.264 AVC) and heavily leverages the tools and concepts of the original codec. The encoded stream it generates, however, is scalable: temporally, spatially, and in terms of video quality. That is, it can yield decoded video at different frame rates, resolutions, or quality levels.
The SVC extension introduces a notion not present in the original H.264 AVC codec−that of layers within the encoded stream. A base layer encodes the lowest temporal, spatial, and quality representation of the video stream. Enhancement layers encode additional information that, using the base layer as a starting point, can be used to reconstruct higher quality, resolution, or temporal versions of the video during the decode process. By decoding the base layer and only the subsequent enhancement layers required, a decoder can produce a video stream with certain desired characteristics. Figure 1 shows the layered structure of an H.264 SVC stream. During the encode process, care is taken to encode a particular layer using reference only to lower level layers. In this way, the encoded stream can be truncated at any arbitrary point and still remain a valid, decodable stream.
(Click to enlarge)
Figure 1. The H.264 SVC Layered Structure.
This layered approach allows the generation of an encoded stream that can be truncated to limit the bandwidth consumed or the decode computational requirements. The truncation process consists simply of extracting the required layers from the encoded video stream with no additional processing on the stream itself. The process can even be performed "in the network". That is, as the video stream transitions from a high bandwidth to a lower bandwidth network (for example, from an Ethernet network to a handheld through a WiFi link), it could be parsed to size the stream for the available bandwidth. In the above example, the stream could be sized for the bandwidth of the wireless link and the decode capabilities of the handheld decoder. Figure 2 shows such an example as a PC forwards a low bandwidth instance of a stream to a mobile device.
Figure 2. Parsing Levels to Reduce Bandwidth and Resolution.