When entertainment content is decoded and rendered on Consumer Electronic (CE) devices, the timing of rendering the video portion of the signal may deviate from the timing of rendering the audio signal. The resultant timing differential is often referred to as a "lip sync" error, since it is most obviously apparent to a viewer when the content contains a representation of a person speaking. In a digital television, the video processing usually takes more time than the audio processing. Because of this, synchronization of video and audio can become an issue, creating an effect similar to a badly dubbed movie, where the audio and video don't match up and the sound of the spoken words is no longer in "sync" with the speaker's lip movement.
HDMI version 1.3 includes a Lip Sync feature that allows the audio processing times in devices to be adjusted automatically to compensate for errors in audio/video timing. The initial implementations of this functionality will be in A/V receivers, but it is likely to appear in DVD players and many other CE devices in the future. Reports from manufacturers indicate that this function is very popular and will be widely implemented.
The HDMI standard requires manufacturers to disclose specific HDMI features enabled in a product. The idea is to provide consumers with the necessary descriptive information they need to understand enabled features that exploit certain capabilities of HDMI, such as Lip Sync. For each feature, the guidelines specify a minimum level of functionality that must be met by the device in order to use the terminology.
While HDMI LLC Authorized Testing Centers (HDMI-ATCs) test for electrical parametric and protocol compliance against the HDMI specification, there is a need to build upon this basic interface testing with additional performance testing programs designed to simplify consumer purchase decisions and enhance the high definition entertainment experience. There are no HDMI-ATC system level Lip Sync performance compliance specifications, or test tools designed to ensure accurate Lip Sync delivery. There is no "timing conformance" specification that must be demonstrated to any authority in order to build a compliant product.
There is an increasing awareness in both broadcast engineering and the CE industry that audio-video synchronization errors, usually seen as problems with lip sync, are occurring more frequently and often with greater magnitude. With the advent of digital processing in CE devices, the issue has become critical. Some CE manufacturers deny there is a problem, believing the audio/video asynchronies in their units to be imperceptible. Knowing how to measure audio/video delays and compensate for them is become increasingly important.
Is it important?
Lip Sync is very important to consumers and the display industry since newer technologies have created a noticeable delay between the processing of video signals and the processing of audio signals. Lip sync correction features take into account processing delays, so that both signals can be synchronized and presented to the viewer together. This greatly improves the entertainment for the viewer.
Lip sync errors detract from the consumer entertainment experience. The lack of lip sync correction is of particular concern in certain types of content, such as product commercials and political candidates' statements. See the report "Effects of Audio Asynchrony on Viewer's Memory, Evaluation of Content and Detection Ability" by Reeves and Voelker for more information (a non-copyrighted PDF is available at Pixel Instruments).
Human studies conducted for sensitivity to audio/video asynchronies have shown that a drift where the audio arrives late is not as annoying as when the audio arrives early. In fact, even a few frames of early audio can quickly be detected by the viewer. The characterization of sensitivity to the alignment of sound and picture includes early work at Bell Laboratories.
The extent to which a consumer can tolerate these asynchronies is dependent upon human perceptual limits as well as personal taste. Steinmentz and Engler conducted user studies (R. Steinmentz and C. Engler, "Human Perception of Media Synchronization," Technical Report 43.9310, IBM European Networking Center, Heidelberg.), and they report several figures of merit for quantifying tolerable audio/video asynchrony limits.
In 1998, ITU-R published BT.1359, recommending the relative timing of sound and vision for broadcasting. Studies by the ITU and the others have suggested that thresholds of timing for viewer detection are about +45ms to -125ms, and the thresholds of acceptability are about +90ms to -185ms. In addition, the ATSC Implementation Subcommittee IS-191 has found that under all operational situations, the sound program should never lead the video program by more than 15ms and should never lag the video program by more than 45ms ±15ms.
When viewers encounter difficulties such as lip sync errors, blocking or black screens, they turn to another channel. Therefore, it is imperative that television engineers find and fix network, encoding, and transmission problems before their viewers become aware of them.
Next: Sync problem origins