Design Article

IMG1

The basics of de-interlacing from good to great

David Vrhovnik, Manager, Video Algorithm Development, Gennum Corporation

6/6/2007 3:00 PM EDT

Introduction
As one of my well respected colleagues' states, "De-interlacing is not a science but rather an art!" At a high-level, de-interlacing is simply doubling the amount of information or lines within an image or technically referred to as a field. At first, this seems like a simple task but in reality this is difficult to implement. What makes this relatively simple function so hard to implement in reality?

Note to the reader: This article is written with the assumption of having a basic understanding of video. For a quick tutorial, read the February issue of Digital TV Design Line how-to article called The basics of interlaced video and the techniques used in de-interlacing.

The Problem
Before we can answer the question at hand, we need to understand what we have to work with"interlaced video. Looking back when Philo T. Farnsworth invented the television one of the biggest challenges faced then and still today is the ability to send more data than the available infrastructure could handle. As a compromise, they created the first video compression scheme in the form of interlaced video.

Amazingly, people fail to appreciate interlaced video as a form of compression due to its rather simple nature. I once lectured to a group of master's and Ph.D. candidates at a local university that is globally respected for their knowledge on compression and during my presentation I brought forward the idea of interlaced video as a form of compression. Remarkably, a number of students argued against this idea. Maybe it doesn't fit well with the modern era of compression they are currently studying, but they need to learn to see the 'forest for the trees.' Let's accept interlaced video as a form of compression and move on.

Since 1930 the video we know and love has been traditionally interlaced. Interlaced video has been shown on cathode ray tubes, commonly referred to as CRT. Only in the past decade have we seen the availability of a new non-interlaced format, otherwise known as the progressive format. The progressive format captures an entire image of video 60 times every second. This format has its origins in the PC world where progressive displays were and still are the norm. Currently, progressive sources are not widely available, but this is changing very rapidly especially with the recent introductions of new gaming systems and HDTV disc players such as Blu-Ray and HD-DVD.

The need for better image quality has driven the need for progressive video. This has resulted in the tremendous growth in popularity of progressive display technologies like plasma and LCD televisions. CRT-TVs still produce the best image quality of the available technologies and relative to other technologies has a longer lifetime. Other issues related to CRTs are its weight and depth. In addition, it's becoming extremely difficult to find a good quality CRT-TV that supports the traditional interlaced format.

Why De-interlacing?
Why De-interlacing?
Just as CRT displays use an interlaced video format, the newer plasma and LCD displays use a progressive video format. If you take a photograph of a CRT with a fast shutter speed, you only see a portion of the entire picture, since the image is continually repainted on the screen. If you take a photograph of an LCD or plasma display, you will see the entire image since every pixel is 'on' all of the time. The process of de-interlacing is enabling interlaced video formats to be properly displayed on newer progressive displays.

The following table highlights some advantages and disadvantages for each format.

Today's challenge is dealing with legacy artefacts or defects not intended to be in the image of interlaced video as we convert interlaced video into the progressive format. This process is called de-interlacing. As long as we have legacy interlaced material this problem will not disappear. As the title describes, de-interlacing implementations can vary from good to great. The question that I am attempting to answer is what can make them great?

Video De-interlacing Techniques
Before we can talk about what makes a good de-interlacer great, we need to understand the various types of de-interlacing. In addition, we also need to understand some of the characteristics associated with de-interlacing implementations. See The basics of interlaced video and the techniques used in de-interlacing. for more detailed information.

De-interlacing has evolved over the past decade from simple line duplications to weaving to leading-edge motion compensation techniques. The following discussion highlights the various techniques.

The first three methods described below all use a single field of information to create the new progressive frame. These methods are also commonly called intra-field or spatial processing. In general, all three of these methods are called bobbing since the entire picture moves up and down slightly when viewed. The differences are primarily in how many lines of the field are stored. As the number of stored lines increases, the de-interlacing quality generally improves.

Line Duplication
Line duplication is the simplest form of de-interlacing. With line duplication, each line whether even or odd within a field is repeated to create a frame. The advantage of using this form of de-interlacing is it only requires the storage of one line of video and is easy to implement. The disadvantage is that it offers low quality since each new progressive frame now only contains half of the original information.

Line Averaging
Line averaging offers better performance than simple line duplication, but it is still not good enough since it softens the image. In line averaging, you simply average the values in the upper and lower lines. This requires the storage of two lines of video.

Vertical Filtering
Vertical Filtering
Vertical filtering performs better than both line duplication and line averaging but is still not good enough since this method also tends to soften the image. In vertical filtering, multiple values are used from both pixels on the lines above and below the pixel you are trying to create. More emphasis is given to pixel values closer to the pixel that you are trying to create. This method requires the storage of multiple lines of data and often results in flickering within the image.

There is a good series of articles written by Gary Merson with the latest article appearing in the October 2006 issue of Home Theatre Magazine http://www.hometheatermag.com/hookmeup/1106hook/. Through this series of articles, Gary highlights the issues with strictly using a bobbing technique for HD video de-interlacing. The unfortunate news is he discovered that only ~54% of the 61 consumer sets that he tested failed to use more sophisticated de-interlacing techniques.

The next logical question is, since video is made up of multiple fields of information why not use this information to create a better progressive image with improved vertical resolution or sharpness? The downside is these techniques require you to store full fields of information to complete the processing and the implementation becomes more complex. Also keep in mind that the more full fields you store the larger amount of processing latency the system will have which is important for video gaming applications. The following techniques are based on this principle.

Weaving
Weaving is the ideal choice when images are static. In weaving, fields 1 and 2 are shown together during the first 60th of a second. For the next 60th of a second, field 2 and field 3 are shown, and so on, such that half of the lines are updated every successive 60th of a second. Sounds simple, however you are displaying field 1 and field 2 at the same time, when they were actually captured 1/60th of a second apart. Using this approach with moving video creates visible artefacts called "mouth teeth" or also commonly referred to as "comb tearing" or "feathering". Weaving offers full vertical resolution at the expense of trading off the temporal resolution.

VT Filtering
Blending or Vertical-Temporal Filtering (VT Filtering)
As the name implies, blending is the addition of filters in both the vertical and temporal directions. In general this technique requires at least two fields of information along with the storage of several lines. This technique offers better performance than the field-based methods described above since it eliminates the "mouth teeth" problem and it reduces the amount of flickering. However, it introduces an artefact known as ghosting or shadowing around moving images. This results in a softer image depending on the amount of motion within the video.

So how can we improve the sharpness of the image? Why not detect edges in the image and filter along those edges? The following technique is based on this principle.

Edge Adaptation
This is an extension of the blending technique whereby for each pixel, you detect an edge and filter along the edge. Typical implementations will filter along multiple edges and determine which one of the edges is the strongest. Current solutions typically use 10-20 directions. As you increase the number of detection edges the quality increases and so does the complexity of the implementation. The proper selection of directions is critical to increase quality. It is also important to handle shallow or low angle directions but once again this is costly in terms of implementation since much longer filters are required. Edge adaptation requires the storage of several fields and lines. The result is better than blending since details are sharp with less flicker and fewer artefacts.

How complex are edge adaptive implementations? If you consider an HDTV or 1080i field, you have 1920x540 active pixels or 1,036,800 pixels. Now, assuming that you calculate 20 different edge directions, this leads to 20,736,000 calculations per field. You need to do this 60 times each second, so the number of calculations grows to ~1.2 billion calculations per second. That is a massive number of calculations; now take it one-step further, imagine doing these calculations for two channels of HDTV.

Since each of these techniques has its strengths and weaknesses, why not switch between the various methods to try to get to the best image possible? The following technique is based on this principle.

Motion Adaptation
Motion Adaptation
Motion adaptation involves the detection of motion within a field. This can be done at various levels (i.e. field, region or group of pixels or pixel-based) with pixel-based providing the best performance. Depending on the amount of motion detected, this technique will switch between the various techniques described above multiple times on a line or even on a per-pixel basis. Another benefit of motion adaptation, at least on a pixel-level, is that resolution pumping is minimized. Resolution pumping results from dramatic resolution changes on a frame-by-frame basis, since you are switching back and forth from bob to weave. Motion adaptation is complex and requires the storage of several fields and lines. The advantage to using this technique is you get good temporal resolution with sharp details, less temporal flickering and fewer artefacts.

Motion adaptation seems like the ideal solution, but it's still not perfect since it relies on detected motion. Unfortunately, where a pixel is located in the past might not be a good indicator of where it is traveling in time or in the temporal space. What if you were able to estimate the temporal direction of motion and filter along that direction? This is the holy grail of de-interlacing and the following technique is based on this principle.

Motion Compensation
In motion compensation, pixel direction between fields is estimated and used to predict which technique should be used and along what edge. This is the most complex technique to use for de-interlacing. Its complexity requires multiple chips to do a proper implementation, however we are starting to see a trickle of lower-quality, single chip solutions being introduced to the marketplace. Motion compensation offers very good temporal resolution with sharp details. Since this technique relies on making predictions under the right conditions it will eventually fail. The key to having a successful implementation is ensuring you have a safe-mode or contingency plan if your prediction fails. Below is a summary of the various de-interlacing techniques.

Film De-interlacing Techniques
Film De-interlacing Techniques
The above techniques generally apply to video material and not film material. The reason for the difference is how the original content was captured. Traditionally, motion pictures that you watch in a cinema are shot with a similar style of film used in older still-photo cameras. Just as photography has moved to digital-still cameras, it's becoming more commonplace to have motion pictures shot in a digital format. Regardless of how they are captured, motion pictures are shot at a rate of 24 frames per second whereas video is typically captured and distributed at 60 fields per second. You might ask why not capture film at higher frame rates to look more like video? The main reason is the high cost to implement such a system and another is artistic impression.

In order to change the frame rate so you can view a motion picture on your television, a process called "3:2 pull-down" is implemented. Imagine an initial film sequence of four frames: A, B, C, D. A telecine machine, used to capture video from original films samples the frames and generates interlaced field samples of each frame " AODD, AEVEN, BODD, BEVEN, etc, where AODD is the field generated using the odd numbered lines of frame A and AEVEN is the field generated using the even numbered lines of frame A. To produce the final interlaced video output, the fields are sent in the order: AODD, AEVEN, AODD, BEVEN, BODD, CEVEN, CODD, CEVEN, DODD, DEVEN.. The result is 3 fields based on A, then 2 of B, then 3 of C, then 2 of D, hence the name of 3:2 pull-down.

Typically, when de-interlacing normal video, the de-interlacing algorithm would assume that AEVEN comes 1/60th of a second after AODD. However, if they are film originated they were sampled by the motion picture camera at the same time. Additionally, the time difference between the combined AODD and AEVEN fields is not 1/60th of a second, but only 1/24th of a second before the next pair of fields since the video originated from film material. To create more confusion, when film is mixed with other film it is possible that one field came from frame A of the original film and the other field came from frame B of the original film. The worst case is when video fields are mixed with fields that originated from film. As you can see, the possibility of creating some very distracting artefacts exists and these cases should get special attention. Therefore, it is imperative that you are able to robustly detect film material and properly handle special cases.

To obtain a good quality image on an HD monitor using source material originated as either motion picture film or as video, the de-interlacer must be able to account for film sources by detecting the repeated fields of information. The detection of film can be done at a field, region or portion of a field or even at a pixel level. The nice thing about film de-interlacing is either you detect the film cadence or not. When you do not properly detect film and process film originated content using any of the video de-interlacing techniques "feathering" will occur in the image.

The majority of film content uses either a 3:2 or a 2:2 cadence. Other cadences exist but are extremely rare. It is possible to have video and film material in the same field that you are trying to de-interlace, which is called mixed mode processing. The key is to avoid "feathering".

Here is the Art!
Here is the Art!
Now that you have an understanding of the problem and the various techniques at your disposal you can start to understand why solving this problem is not straightforward. Consider the de-interlacing techniques that I described as brushes in your de-interlacing toolbox and the video signal is your canvas. How you use those brushes and under what conditions is the art behind de-interlacing!

The typical decision logic will consist of deciding if the content is film or video. If it is film, you are set, as long as you can robustly detect and decode the film cadence. You need to make sure that you have the right threshold between deciding what was originated on video and what originated on film. If you determined that it is video content, you have a number of choices. State of the art de-interlacing algorithms will typically try to measure the amount of motion at the pixel level. Based on the motion measurements, the algorithm will decide which techniques to use. For little or no motion you want to stick to weaving or temporal methods. They offer the best vertical resolution and sharpness. For areas with fast motion, you want to stick to bobbing or spatial methods which provide the best temporal resolution. But what do you do if the motion falls between these two extremes? De-interlacing video images is all about making trade-offs between vertical image resolution versus temporal image resolution.

On top of these methods, you may decide to use edge adaptive processing. However, you need to decide if you can afford the complexity of doing billions of calculations per second.

De-interlacing can be easy but doing it right is an art! This art is refined and tweaked over many generations to offer optimized performance when it comes to picture quality. However, not all de-interlacers are the same. Some perform better with SD sources or HD sources, while others produce a sharper or softer image and attempt to hide other possible mistakes. Certain de-interlacers perform better with film, especially when it comes to PAL (versus NTSC) sources. Some perform better on edges while others perform better with fast motion. No single de-interlacer is going to be perfect since de-interlacing involves making educated guesses. There will always be a sequence that will 'break' a de-interlacer. The important thing to ask is how this de-interlacer performs for the type of content that you typically watch.

Gennum's De-interlacing Technology
Gennum's De-interlacing Technology
Gennum's VXP technology has an extensive toolkit when it comes to de-interlacing. Its leading edge algorithms, which were originally created to serve the broadcast and professional markets, consist of large vertical filters, vertical-temporal filters, strict temporal filters, multi-edge adaptive processing, pixel-based motion adaptive processing and robust film mode detection algorithms. In additional to professional broadcast equipment, VXP technology is now being used in HDTVs and home theatre equipment, bringing a new level of quality to the consumer market. And, not only does Gennum employ this level of processing for SD content, also it also applies the same techniques for HD content. The latest generation of Gennum ICs simultaneously de-interlace and process two full HD signals, an increasingly important capability for HDTV manufacturers today.

Conclusion
With the emergence of high definition television formats along with progressive displays, the need for higher quality processing becomes paramount as screen sizes increase. Designers can develop displays that will deliver the optimum performance for both SD and HD content but must maximize the different de-interlacing techniques available. Hopefully this paper has given you an appreciation and realization that not all de-interlacers are created equal, since de-interlacing really is an art!

I'd like to thank Gheorghe Berbecel, Vince Harradine for their valued contributions to the creation of this article.

About the Author
David Vrhovnik is the Manager of Video Algorithm Development at Gennum Corporation in Burlington, Ontario, Canada. David obtained his Computer Engineering degree from McMaster University (Hamilton, Ontario) and his MBA from the DeGroote School of Business (Hamilton, Ontario). If you have any comments or questions about the article, you can contact him at dave_v@gennum.com


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm