Breaking News
Comments
Newest First | Oldest First | Threaded View
Yanaqa
User Rank
Rookie
re: HDMI's Lip Sync and audio-video synchronization for broadcast and home video
Yanaqa   11/6/2008 10:35:37 PM
NO RATINGS
1 saves
Company Overview AD SYSTEMS designs, manufactures, installs and services AUTOMATED DISPLAY SYSTEMS throughout the world. For over three decades, the management of AD SYSTEMS has supplied visual communication systems to thousands of customers. Every day millions of people count on AD SYSTEMS displays for information and entertainment. As our name implies, AD SYSTEMS offers a total ?systems? approach to serving our client?s needs. We provide fully integrated communications and identity solutions. Every facet of a client?s project from image design to product engineering, project management, manufacturing, installation and maintenance is provided in a total package solution from one company. AD SYSTEMS is committed to forming a partnership with our customers as their total advertising, communications and identity solutions supplier. AD SYSTEMS can help you at every stage of a project from conception through completion that will give you a competitive edge. ?To compete in today?s marketplace, you must partner with ?can do? people. People who know your business. People who share your strategic goals and dedication. People who can provide the skills to help you succeed. Otherwise?..you might as well turn off the lights and lock the doors. You?re history.? adsystemsled.com

NickJ2008
User Rank
Rookie
re: HDMI's Lip Sync and audio-video synchronization for broadcast and home video
NickJ2008   9/5/2008 1:43:01 PM
NO RATINGS
1 saves
Sorry if I came cross as "pushing" a specific product. I did mention that three companies - not just mine - now produce remote controlled digital audio delays to allow subjective lip-sync correction. A review off all three digital audio delays for lip-sync correction can be found here: http://www.audaud.com/article.php?ArticleID=3011 These products are currently the "only" solution to really correct lip-sync. I certainly agree with "jesup" that they might be considered "bandaids" but when you are bleeding a "bandaid" is certainly better than nothing. I mentioned that the industry needs to incorporate sync information starting with the original content creation - upon which "automatic lip-sync correction" could be based but I doubt we will see that any time soon. I say that because the Reeves and Voelker study at Stanford which (referenced in the original article)proved the negative impact on viewer perception (even at 41.25 ms which most people don't notice) was made public over 12 years ago and little has changed since.

NickJ2008
User Rank
Rookie
re: HDMI's Lip Sync and audio-video synchronization for broadcast and home video
NickJ2008   9/5/2008 1:15:48 PM
NO RATINGS
1 saves
I applaud this excellent article with comprehensive references on an important subject largely ignored by the industry. A few important points fundamental to the lip-sync problem may, however, be obscured by this excellent overview of the subject even though they are referenced. For example, you might get the impression that HDMI 1.3's "Automatic Lip-Sync Correction" feature actually corrects lip-sync error in the arriving signals. It does NOT. Most consumers have that misconception when in reality all that feature does is "automatically" set a fixed audio delay to offset the display's fixed video delay. This is something you can already do with a "one time adjustment" in almost all recent a/v receivers and if that were an adequate solution to correcting lip-sync the three companies' products I will mention at the end of my comments would have no market. In fact, a "fixed delay" like even HDMI 1.3 receivers will add can make lip-sync "worse" in cases where audio arrives already delayed which happens more often than you would think - especially with DVD's. HDMI 1.3 allows a display to communicate to a receiver what it's video delay will be during the initial EDID handshaking but that fixed delay has no effect on lip-sync error in the broadcast or DVD signals and as mentioned will exacerbate the problem in cases where the display's video delay could be used to advantage in offsetting an arriving audio delay. Unfortunately the industry's solution to lip-sync error in the arriving signals has historically been an "open loop" control system requiring the measurement of video delay at every possible entry point and the addition of a compensating audio delay. That can only "maintain" lip-sync that was already correct and in reality much of the lip-sync error originates with the content producer often starting at image capture and continuing through editing and post production. Tiny errors accumulate until they are large. But even if the original content has correct lip-sync (rare) any error in this open loop scheme is also cumulative since it can not be automatically detected downstream. There is a white paper on the Pixel Instruments site with a graph showing the time varying lip-sync error present in most broadcasts. Its pretty astounding! I have observed over 90 ms differences in lip-sync error from broadcast program to program and over 50 ms variation from DVD to DVD. Lip-sync error is clearly "not" simply due to the fixed video delay caused by a display as the HDMI 1.3 "feature" implies. There is an excellent white paper written for the SMPTE Ad Hoc Committee on lip-sync by the VP of Engineering for Liberty Broadcasting (now Raycomm) on the technical details page of www.LipFix.com in which he describes his stations' diligent efforts to correct lip-sync but he concludes the paper saying all they can hope to accomplish is "to add no more lip-sync error" since the feeds he receives from the major networks are "already out of sync". Tektronix had an excellent solution to the lip-sync problem - their now discontinued AVDC100 - which would watermark audio and video that was in-sync and keep it in-sync but I'm convinced its market failure was due to exactly that: "There simply is nowhere you can count on the audio and video to be in-sync." If it is not in-sync to start with, watermarking it and maintaining that incorrect sync accomplishes nothing. There are other products on the market which do exactly that (maintain incorrect lip-sync)but which have "implied" they could automatically correct lip-sync - such as scan converters - MPEG encoders and decoders, etc. They all assume the source material was in sync upon arrival and internally watermark the streams and restore sync upon output similar to the AVDC100. But again, maintaining whatever lip-sync error was already in the arriving streams. If you see "automatic lip-sync correction" look closer because it is not possible because there is nothing in the video and audio signals to define when they were ever in sync! What the industry needs is a standard for watermarking the audio and video during content creation (and more effort on the creator's part to produce perfect lip-sync) so that it could be maintained throughout the broadcast chain and DVD encoding processes based upon those watermarks. Until that happens we can forget "automatic lip-sync correction". But I feel the first step and most overlooked lip-sync issue is: "How closely synced does it need to be?" Is the objective to "mask it" or actually correct it? All the industry standards allow lip-sync error "above" the values the Reeves and Voelker research at Stanford proved causes a negative impact on viewer perception even when not consciously noticed. Clearly then, such lax standards can only "mask" the problem concealing it rather than actually eliminating its negative impact on our perception! Sound cannot occur "before" the action that creates the sound (in the real world) so when we encounter this contradiction of reality in our home theaters our subliminal reaction is to "look away" from the lips so as not to be confronted by this impossibility. This explains why most people won't notice over 40 ms of lip-sync error and why the ITU (and other standards groups) seem to consider that a reasonable target. It could also explain Stanford's discovery that viewers felt the characters were more "anxious", "less persuasive", less successful", etc. when lip-sync error was present since these are the same feelings we usually have about people who do not look us in the eye when talking. Ironically, it is "we" who are not looking the characters in the eye! As long as we can look askance at the characters faces and still keep our eyes on-screen we may not consciously notice the lip-sync error but this subconscious avoidance still undermines our impression of the characters and story - which is the essence of cinema itself isn't it! If our impression of the characters and story is not the main objective of cinema, what is it? Is it OK to just "mask" the lip-sync problem and reduce it to about 40 ms where most people won't notice it? Not if you believe the Stanford research which proved lip-sync error - even in the 40 ms range - undermines our perception. If you think you have corrected lip-sync error with your receiver (even an HDMI 1.3 receiver and display)run this test: Force yourself to "look at the lips". It may take some effort because our natural tendency is to look "away" from something our brains cannot reconcile or process. Even though most lip-sync error appears as "audio ahead of video" the opposite condition is also unnatural since our brain uses the delay as a spacial queue and is confused when audio arrives too early or too late. That is, when someone "looks" like they are 20 feet away but sounds like they are 5 feet or 50 feet away. That too could cause us to subliminally avoid such an impossibility because in nature sound will be delayed about a millisecond for every 13 inches it travels from its source with very small variations due to altitude, temperature, humidity, etc. and "never" by any significant amount. If you force yourself to consciously look closely at the lips you will overcome your avoidance mechanism and you will see the lip-sync error still present perhaps all the way down to a few milliseconds! It may be hard to believe but we have had customers of our first two generation lip-sync correction products, our DD340 and DD540, ask for adjustment "below a millisecond" so in our new DD740 we added a special "fine" mode allowing 1/3 ms steps. Admittedly few will need that fine an adjustment but many of our customers adjust down to a few ms which most in the industry do not believe possible. Unfortunately this article overlooks the only current solution to perfect lip-sync error correction which is a subjective adjustment of an audio delay while watching the moving lips. The Pixel Instruments "Lip Tracking" system mentioned attempts to automate this type of correction and probably has the greatest potential for broadcasters to correct lip-sync but home theater equipment can distort it again downstream so the ultimate correction should be done at the endpoint - the display and the surround sound system in the home. In addition to our Felston DD740 4 Input Digital Audio Delay for lip-sync correction, two other companies now produce remote controlled digital audio delays which allow fine tuning lip-sync while watching an undisturbed image - the essential feature for true lip-sync correction. They are Alchemy2 and Primare. All three allow tweaking the audio delay at the touch of a + or - button on a remote control which seems a minor inconvenience considering the alternative of allowing this contradiction of reality to continue being masked and undermining our perception. Also, note that when any of these audio delays are used with a display's inherent video delay you effectively gain a "negative" delay equal to the display's video delay.

jesup
User Rank
Rookie
re: HDMI's Lip Sync and audio-video synchronization for broadcast and home video
jesup   8/27/2008 7:21:44 PM
NO RATINGS
1 saves
All of this ignores the fact that the HDMI design apparently was flawed to begin with. (Oddly, I commented on this here and the comment seems to have disappeared.) NOTE: I haven't read the HDMI spec; I'm working on the info in this article. Audio and video should all the timestamped with a presentation time that tells you for sure which video should be synced with which audio. This is what's done in all video and audio going over the net using RTP, and pretty much any other networked A/V protocol (since you assume there can be delay or jitter). My guess is that HDMI was built with an assumption that inputs are (in some way) synced at the time of reception, and as mentioned by the other commenter, the best you can do is not make it worse on your output. There's nothing actively marking or correcting sync. The classic chains in TV stations often work this way, with any sync losses traditionally being fixed delays that can be compensated with a fixed delay. You can see how easy this is to get right for digital video with the problems stations have had in switching to HD chains. Watermarking is NOT a good solution. Multiple watermarks (or even one) degrade the signal, and at any stage processing of the audio or video could by chance remove them. The only reason to use watermarks is lack of any out-of-band way to mark time or to match up presentation times. HDMI is probably stuck now, or at least will require fancier solutions. They probably *should* have mandated that devices timestamp audio and video (and if a device gets an unstamped stream, assume it's in-sync (ugh) and add timestamps). Devices should have been required to transmit timestamped A/V with no more than an Xms sync mismatch, which would require a small amount of buffer in each device - more if there's variable processing time inside of it. The biggest problem would be disjoint display/playback devices - HDMI should also have provided a way for devices to synchronize times (ala NTP), so two screens have a chance of displaying data at the same time (or a receiver can play back audio at the same time the LCD monitor displays the video). This would require devices to broadcast their current delays, and for devices to use the longest current delay. It is inherently problematic to deal with changing delays, which almost always is caused by video. However, it's far easier to adjust video delay with skipped or duplicated frames than to adjust audio delay. Items like the device the previous poster was pushing, and the fix proposed by the writer of the article are bandaids on a bad initial set of design decisions, based on "traditional" ways of handling analog A/V.

NickJoh
User Rank
Rookie
re: HDMI's Lip Sync and audio-video synchronization for broadcast and home video
NickJoh   8/20/2008 6:28:18 PM
NO RATINGS
1 saves
I applaud this excellent article with comprehensive references on an important subject largely ignored by the industry. A few important points fundamental to the lip-sync problem may, however, be obscured by this excellent overview of the subject even though they are referenced. For example, you might get the impression that HDMI 1.3's "Automatic Lip-Sync Correction" feature actually corrects lip-sync error in the arriving signals. It does NOT. Most consumers have that misconception when in reality all that feature does is "automatically" set a fixed audio delay to offset the display's fixed video delay. This is something you can already do with a "one time adjustment" in almost all recent a/v receivers and if that were an adequate solution to correcting lip-sync the three companies' products I will mention at the end of my comments would have no market. In fact, a "fixed delay" like even HDMI 1.3 receivers will add can make lip-sync "worse" in cases where audio arrives already delayed which happens more often than you would think - especially with DVD's. HDMI 1.3 allows a display to communicate to a receiver what it's video delay will be during the initial EDID handshaking but that fixed delay has no effect on lip-sync error in the broadcast or DVD signals and as mentioned will exacerbate the problem in cases where the display's video delay could be used to advantage in offsetting an arriving audio delay. Unfortunately the industry's solution to lip-sync error in the arriving signals has historically been an "open loop" control system requiring the measurement of video delay at every possible entry point and the addition of a compensating audio delay. That can only "maintain" lip-sync that was already correct and in reality much of the lip-sync error originates with the content producer often starting at image capture and continuing through editing and post production. Tiny errors accumulate until they are large. But even if the original content has correct lip-sync (rare) any error in this open loop scheme is also cumulative since it can not be automatically detected downstream. There is a white paper on the Pixel Instruments site with a graph showing the time varying lip-sync error present in most broadcasts. Its pretty astounding! I have observed over 90 ms differences in lip-sync error from broadcast program to program and over 50 ms variation from DVD to DVD. Lip-sync error is clearly "not" simply due to the fixed video delay caused by a display as the HDMI 1.3 "feature" implies. There is an excellent white paper written for the SMPTE Ad Hoc Committee on lip-sync by the VP of Engineering for Liberty Broadcasting (now Raycomm) on the technical details page of www.LipFix.com in which he describes his stations' diligent efforts to correct lip-sync but he concludes the paper saying all they can hope to accomplish is "to add no more lip-sync error" since the feeds he receives from the major networks are "already out of sync". Tektronix had an excellent solution to the lip-sync problem - their now discontinued AVDC100 - which would watermark audio and video that was in-sync and keep it in-sync but I'm convinced its market failure was due to exactly that: "There simply is nowhere you can count on the audio and video to be in-sync." If it is not in-sync to start with, watermarking it and maintaining that incorrect sync accomplishes nothing. There are other products on the market which do exactly that (maintain incorrect lip-sync)but which have "implied" they could automatically correct lip-sync - such as scan converters - MPEG encoders and decoders, etc. They all assume the source material was in sync upon arrival and internally watermark the streams and restore sync upon output similar to the AVDC100. But again, maintaining whatever lip-sync error was already in the arriving streams. If you see "automatic lip-sync correction" look closer because it is not possible because there is nothing in the video and audio signals to define when they were ever in sync! What the industry needs is a standard for watermarking the audio and video during content creation (and more effort on the creator's part to produce perfect lip-sync) so that it could be maintained throughout the broadcast chain and DVD encoding processes based upon those watermarks. Until that happens we can forget "automatic lip-sync correction". But I feel the first step and most overlooked lip-sync issue is: "How closely synced does it need to be?" Is the objective to "mask it" or actually correct it? All the industry standards allow lip-sync error "above" the values the Reeves and Voelker research at Stanford proved causes a negative impact on viewer perception even when not consciously noticed. Clearly then, such lax standards can only "mask" the problem concealing it rather than actually eliminating its negative impact on our perception! Sound cannot occur "before" the action that creates the sound (in the real world) so when we encounter this contradiction of reality in our home theaters our subliminal reaction is to "look away" from the lips so as not to be confronted by this impossibility. This explains why most people won't notice over 40 ms of lip-sync error and why the ITU (and other standards groups) seem to consider that a reasonable target. It could also explain Stanford's discovery that viewers felt the characters were more "anxious", "less persuasive", less successful", etc. when lip-sync error was present since these are the same feelings we usually have about people who do not look us in the eye when talking. Ironically, it is "we" who are not looking the characters in the eye! As long as we can look askance at the characters faces and still keep our eyes on-screen we may not consciously notice the lip-sync error but this subconscious avoidance still undermines our impression of the characters and story - which is the essence of cinema itself isn't it! If our impression of the characters and story is not the main objective of cinema, what is it? Is it OK to just "mask" the lip-sync problem and reduce it to about 40 ms where most people won't notice it? Not if you believe the Stanford research which proved lip-sync error - even in the 40 ms range - undermines our perception. If you think you have corrected lip-sync error with your receiver (even an HDMI 1.3 receiver and display)run this test: Force yourself to "look at the lips". It may take some effort because our natural tendency is to look "away" from something our brains cannot reconcile or process. Even though most lip-sync error appears as "audio ahead of video" the opposite condition is also unnatural since our brain uses the delay as a spacial queue and is confused when audio arrives too early or too late. That is, when someone "looks" like they are 20 feet away but sounds like they are 5 feet or 50 feet away. That too could cause us to subliminally avoid such an impossibility because in nature sound will be delayed about a millisecond for every 13 inches it travels from its source with very small variations due to altitude, temperature, humidity, etc. and "never" by any significant amount. If you force yourself to consciously look closely at the lips you will overcome your avoidance mechanism and you will see the lip-sync error still present perhaps all the way down to a few milliseconds! It may be hard to believe but we have had customers of our first two generation lip-sync correction products, our DD340 and DD540, ask for adjustment "below a millisecond" so in our new DD740 we added a special "fine" mode allowing 1/3 ms steps. Admittedly few will need that fine an adjustment but many of our customers adjust down to a few ms which most in the industry do not believe possible. Unfortunately this article overlooks the only current solution to perfect lip-sync error correction which is a subjective adjustment of an audio delay while watching the moving lips. The Pixel Instruments "Lip Tracking" system mentioned attempts to automate this type of correction and probably has the greatest potential for broadcasters to correct lip-sync but home theater equipment can distort it again downstream so the ultimate correction should be done at the endpoint - the display and the surround sound system in the home. In addition to our Felston DD740 4 Input Digital Audio Delay for lip-sync correction, two other companies now produce remote controlled digital audio delays which allow fine tuning lip-sync while watching an undisturbed image - the essential feature for true lip-sync correction. They are Alchemy2 and Primare. All three allow tweaking the audio delay at the touch of a + or - button on a remote control which seems a minor inconvenience considering the alternative of allowing this contradiction of reality to continue being masked and undermining our perception. Also, note that when any of these audio delays are used with a display's inherent video delay you effectively gain a "negative" delay equal to the display's video delay. As an example compare the use of a Felston DD740 correcting lip-sync on a display with a 100 ms video delay and an HDMI 1.3 display/receiver combination: Case 1: Arriving video is delayed 80 ms behind its audio. The Felston DD740 would be adjusted to 180 ms (100 ms for the display's added delay and 80 ms for the arriving signals existing video delay)and at that value lip-sync would be perfect. The HDMI 1.3 display would tell the receiver to add 100 ms delay so the 80 ms error in the arriving signal would still be present in the program being viewed. Case 2: Audio arrives delayed by 80 ms after the video. The Felston DD740 would be set to 20 ms which would allow 80 ms of the display's 100 ms video delay to cancel the 80 ms audio delay in the arriving signal and the DD740's 20 ms delay would cancel the balance of the display's delay and lip-sync would be perfect. The HDMI 1.3 display would tell the receiver to add 100 ms audio delay as before so the 80 ms lip-sync error present in the arriving signal would be preserved and visible in the program being viewed. This is a case where lip-sync error would have been less if the HDMI 1.3 "feature" had been turned "off". That is, by doing nothing, only 20 ms lip-sync error would have been displayed since 80 ms of the display's 100 ms video delay would have cancelled the arriving lip-sync error leaving only 20 ms contributed by the display.

jesup
User Rank
Rookie
re: HDMI's Lip Sync and audio-video synchronization for broadcast and home video
jesup   8/19/2008 7:25:33 PM
NO RATINGS
From the point of view of an engineer working in this space, all these solutions sound like poor bandaids on a problem that shouldn't have happened in the first place. There's a reason why RTP packets (in internet VOIP and video) have timestamps and packets that link those to a shared timebase so you can synchronize audio and video. It's unimaginable to me that they designed HDMI without at least considering the possible variable delays on the two chains. You can timestamp the data (and let the receiving device implement any needed FIFO buffers for synchronization); you can specify that the sending device do so synchronized (and adjust if needed to make that happen) and let the receiving device implement internal delays as needed to keep the synchronization. My assumption is that the HDMI designers did the second (assume sync at the interface), and didn't specify it or test it, so neither the senders NOR the receivers implemented the necessary buffering or delays or frame dup/skip to. Sheesh. The proposed solutions are silly - why guess at the induced delays? The hardware/software *knows* what the delays are, and knows when they change. Each device can be responsible for resyncing before output. You can decide to implement a trickier scheme whereby you push all the delay handling into one device in the chain (either with reports ("audio is delayed 30ms"), or with timestamps (less needed here since the streams aren't going over lossy networks).



EE Life
Frankenstein's Fix, Teardowns, Sideshows, Design Contests, Reader Content & More
Max Maxfield

Fist Bumps & the Zombie Apocalypse
Max Maxfield
40 comments
Are you concerned about the possibility of a Zombie Apocalypse or do you scoff at the thought of such an eventuality? If the latter, would you be surprised to hear that the US military has ...

Rishabh N. Mahajani, High School Senior and Future Engineer

Future Engineers: Don’t 'Trip Up' on Your College Road Trip
Rishabh N. Mahajani, High School Senior and Future Engineer
9 comments
A future engineer shares his impressions of a recent tour of top schools and offers advice on making the most of the time-honored tradition of the college road trip.

Larry Desjardin

Engineers Should Study Finance: 5 Reasons Why
Larry Desjardin
41 comments
I'm a big proponent of engineers learning financial basics. Why? Because engineers are making decisions all the time, in multiple ways. Having a good financial understanding guides these ...

Karen Field

July Cartoon Caption Contest: Let's Talk Some Trash
Karen Field
151 comments
Steve Jobs allegedly got his start by dumpster diving with the Computer Club at Homestead High in the early 1970s.

Top Comments of the Week
Flash Poll
Like Us on Facebook
EE Times on Twitter
EE Times Twitter Feed

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)