Design Article

IMG1

Advanced video frame rate conversion via object-based motion-compensated interpolation

Demin Wang, Senior Scientist and Team Leader, Video Coding and Processing, and
André Vincent, Manager of the Advanced Video Systems Group, Communications Research Centre, Canada

2/23/2007 3:35 AM EST

The convergence between broadcasting, telecommunication, and Internet has created a proliferation of the number of video formats and increased the need for high quality frame rate conversions. Frame rate represents the number of frames (images) per second of a video or film material. For example, 35mm film has a frame rate of 24 frames per second (fps), television has a frame rate of 30 fps in North America and 25 fps in Europe, while computer monitors work at 60, 75, or even higher frame rates. With the convergence, users require that a single display device be able to display high quality video at various frame rates coming from different sources.

Thanks to advances in IC technology, video display devices can include a variety of image quality enhancement and format conversion capabilities. Digital noise reducers, coding artefact reducers, and de-interlacers have been built in high-end display devices and significantly improve the video quality. Frame rate conversion may be the next technology to be integrated into modern video displays.

High quality frame rate conversion is, however, one of the most challenged processes in image and video processing. Simple techniques, such as frame repetition and temporal filtering, often result in motion judder or blurred images.

A more complicated technology, called motion-compensated frame interpolation (MCFI), estimates motion trajectories and interpolates new images along the motion trajectories. This may yield high quality conversion if the true motion trajectories are accurately estimated and the occlusion areas caused by motion are properly processed. Unfortunately, it is very difficult to accurately estimate the true motion and to properly process motion-occluded areas.

We have developed an advanced frame rate converter, called CRC-FRC, for high quality video and film frame rate conversion. At the core of this converter are an object-based motion-compensated frame interpolation (object-based MCFI) method and a mechanism to measure motion field reliability. This converter is able to accurately estimate true motion trajectories, process well motion occlusion, and generate high quality images and smooth motion.

In this article, we first briefly describe the CRC-FRC algorithms. Then we report the performance evaluation of CRC-FRC, compared with four existing frame rate converters: ReTimer from REALVIZ, Twixtor from Re:Vision, frame repetition, and frame averaging. The comparison was conducted in terms of both objective and subjective quality of the video generated by these converters.

NEXT: The Advanced Frame Rate Converter Algorithm

The advanced frame rate converter
The advanced frame rate converter, CRC-FRC, generates new images based on an object-based MCFI method and a measure of motion field reliability. The block diagram of this converter is shown in Figure 1, in which I1 and I2 denote the two successive input images, I denotes an image to be interpolated between the input images, T1 denotes the time interval between I1 and I, and T2 denotes that between I and I2.

From Figure 1, if T1 (or T2) is smaller than the predetermined minimum time interval Tmin, the new image I is just a copy of input image I1 (or I2). Otherwise, the forward motion field (motion from image I1 to I2) is estimated and the reliability R of the estimated motion field is measured. The measure of motion field reliability is based on the a posteriori probability of motion vectors, the length of motion vectors, and the local smoothness of image intensity. If there is a scene or brightness change, a very fast motion, objects with no texture, or motion too complicated to estimate, the reliability of the estimated motion field will be low. The new image is a copy of the input image that is closest to the new image on the time axis if the reliability R is smaller than a predetermined minimum reliability Rmin. Otherwise, the new image is generated using the object-based MCFI method.


Figure 1: Block diagram for generating an image I between two successive input images I1 and I2

As shown in Figure 1, the object-based MCFI method is implemented with two parallel processes. Each of the processes involves the same algorithms, including image segmentation, object-based motion field processing, determination of object depth-order and covered areas, object-based interpolation, and graceful degradation. The first process uses the forward motion MVf and the segmentation of image I1, while the second uses the backward motion MVb (motion from image I2 to I1) and the segmentation of image I2. The estimates of these two motion fields are usually different because of motion occlusion. These processes result in two interpolated images, If and Ib, respectively, at the time instant of the new image. Finally, these two images If and Ib are combined, according to their interpolation errors, to produce the new image I.

The algorithms used in the object-based MCFI method are briefly described as follows. The motion estimation is performed using an adaptive hierarchical block?matching algorithm. This algorithm is fast and produces smooth motion fields that are close to the true motion fields. Image segmentation is carried out using a multi-scale gradient algorithm followed by the watershed algorithm and a region-merging step. After segmentation, an image is divided into regions of arbitrary shape, called objects.

The estimated motion fields are then processed on the basis of objects. The object-based motion field processing is to detect erroneous motion vectors and to make the motion fields smooth within each object. The depth-order of objects refers to the relative positions of the objects in the scene captured by a camera. When objects are moving, it is the depth-order and covered areas that determine the appearance and disappearance of those objects in the captured images. The step of object-based interpolation is used to interpolate images If and Ib with the information of objects, smooth motion field within each object, object depth-order, and covered areas. The graceful degradation, whose design is based on characteristics of the human visual system, reduces the visibility of artefacts if there are any.

NEXT: Performance evaluation

Performance evaluation
The performance of the advanced frame rate converter, CRC-FRC, has been evaluated against the following four frame rate converters:

  • ReTimer V1.1 by REALVIZ, a commercial software engine using pixel by pixel displacement vector mapping (RET),
  • Twixtor V4.0 by Re:Vision, a commercial plug-in product using pixel by pixel tracking between frames (TWX),
  • Frame Averaging, average all pixels in successive pairs of source frames (AVE),
  • Frame Repetition, repeat all frames once (RPT).
Twixtor and ReTimer are also based on motion-compensated frame interpolation and commercially available as plug-ins for Adobe After Effects. The plug-in versions were used in the performance evaluation.

Test sequence preparation
The video material used in the evaluation consists of nine widely available, eight second long, standard definition (720x480 pixels at 30 fps) video sequences in progressive format. These sequences are Bicycle, Birches, Cheerleaders, Coast Guard, Ferris Wheel, Flower Garden, Football, Mobile & Calendar, and Table Tennis, shown in Figure 2. They were selected for their high motion content and the widespread use in video coding development (MPEG) and video quality testing (VQEG).


Figure 2: Video sequences used in the performance evaluation

The source video sequences were temporally down-sampled to 15 fps by removing all of the even numbered frames. The five frame rate converters were then applied to recreate the removed frames. This generated a total of 45 temporally converted (15 to 30 fps) sequences that could be compared to the 30 fps source sequences. For the three motion compensated frame rate converters, care was taken to use options that produce the best output according to the manufacturers instructions. In the case of CRC-FRC, it was configured to interpolate all frames by setting the reliability threshold Rmin to one percent. For ReTimer their slow-down tutorial for Abobe After Effects was followed to double the frames in the sequences. All default options were selected for Twixtor.

NEXT: Objective and Subjective Comparisons
Objective comparison
The PSNR between the generated frames and the corresponding source frames was calculated as the measure of objective quality. A higher PSNR means a better video quality and a better converter. The average PSNR for each sequence and for each converter is recorded in Table 1.

The CRC-FRC converter had a higher PSNR rating than all other methods for all sequences, except for the Mobile & Calendar sequence. In this case Twixtor was 0.38 dB higher than CRC-FRC. The Twixtor was ranked as the second best converter. It was nearly equal to CRC-FRC for the Birches, and Cheerleaders sequences.

When averaged over all sequences, the CRC_FRC converter outperformed all of the other converters in terms of PSNR. It was 1.56 dB over Twixtor and more than 5.56 dB over the others.


Table 1: Objective Quality measured in PSNR (dB)

Subjective quality assessment
We conducted a subjective video quality assessment experiment to confirm that CRC-FRC also results in a higher perceived video quality. Specifically, we asked a group of non-expert viewers to rate the quality of six test video sequences that had been converted with the five frame rate converters.

Twenty-two viewers, with a mean age of 39.5 years, participated in this experiment. All viewers had normal visual acuity and normal colour vision. In addition, viewers had limited or no expertise in video imaging and quality assessment, and no knowledge of the purpose of the experiment.

The six test sequences are Bicycle, Cheerleaders, Ferris Wheel, Flower Garden, Football, and Table Tennis. For each video sequence, six 8-second samples were used in the experiment. One sample was the source sequence, while the other five samples were the converted sequences. The video sequences were presented on Sony GDM-F500 CRT monitors and were viewed by the participants from a distance equal to four times the height of the images on the screen.

The viewers rated the subjective quality of the test sequences using a double-stimulus continuous-quality scale (DSCQS) method. They were presented with a sequence, announced verbally as "A", followed, after a brief grey display, by a different version of the same sequence, announced verbally as "B". To complete a trial, this pair of presentations was repeated a second time, i.e., AB AB. In each trial, either "A" or "B" was a source sequence; the other was a converted sequence obtained from the corresponding source sequence. The order of presentation of the source and converted sequences was randomized across trials. At the end of each trial, viewers rated the subjective quality of both A and B presentations using a judgement scale. The rates were coded from a value of 0 (corresponding to a "Bad" quality) to a value of 100 (corresponding to an "Excellent" quality).

NEXT: Subjective DMOS Comparisons
Subjective DMOS Comparisons

The subjective quality is expressed as the difference mean opinion score (DMOS). Difference opinion scores are here defined as the arithmetic difference between the rating of the converted sequence and that of the source sequence obtained on each trial. Therefore, a DMOS of zero implies that the converted and source sequences were judged as having the same perceived quality, whereas a negative DMOS measures the loss of perceived quality due to the frame rate conversion.

Figure 3 shows the average subjective quality over all viewers and all sequences for the five frame converters. It can be easily appreciated that the CRC-FRC provided a much higher level of video quality than the other frame rate converters. In fact, the detailed assessment results showed that, for each of the testing sequences, CRC-FRC always provided a better subjective quality than all the other frame rate converters.


Figure 3: Subjective video quality, expressed as DMOS, for six testing sequences. The error bars indicate the 95% CI.

Summary and conclusions
In this article, we presented an advanced frame rate converter, called CRC-FRC, for high quality video and film format conversion. This advanced frame rate converter is based on image segmentation, object-based motion field processing, motion-compensated frame interpolation, and a measure of motion reliability.

The performance of CRC-FRC was evaluated in terms of both objective and subjective quality of converted video sequences and compared with four other frame rate converters: ReTimer V1.1 by REALVIZ, Twixtor V4.0 by Re:Vision, Frame Averaging, and Frame Repetition. For the objective assessment, we calculated the peak signal-to-noise ratio (PSNR) for nine video sequences. For the subjective assessment, we performed a subjective video quality experiment in which a group of non-expert viewers rated the perceived video quality of six video sequences using a standard DSCQS methodology.

The results of both objective and subjective assessments indicate that the CRC-FRC converter provides significantly better video quality than the other frame rate converters tested.

About the authors
Demin Wang is currently Senior Scientist and Team Leader of Video Coding and Processing at the Communications Research Centre, Canada. He received his B. S. and M. S. degrees in electrical engineering from Shandong University of Technology, China, in 1982 and 1985, respectively, and his Ph. D. degree from the Institut National des Sciences Appliquees (INSA) de Rennes, France, in 1992. In 1985, he joined Shandong University of Technology where he was a professor of electrical and computer engineering from 1992 to 1993. He was a visiting researcher at the University of Sherbrooke, Canada, from 1993 to 1994, and at the Institut de Recherche en Informatique et Systmes alatoires (IRISA) Rennes, France, from 1994 to 1995. Since 1996, he has been with the Communications Research Centre, Canada. His research interests include image processing and coding, video processing and coding, 3-D video, digital TV, and broadcasting. Dr. Wang has published over 80 journal and conference papers and holds two U.S. patents. He is an Associate Editor for IEEE Trans. on Broadcasting. He can be reached at demin.wang@crc.ca.

André Vincent is currently Manager of the Advanced Video Systems Group, at the Communications Research Centre, Canada. He received a B.Sc. in electrical engineering from the cole Polytechnique, Montral, Canada, in 1975. From 1975 to 1977, he worked at the Department of National Defence, in the design of maritime communications systems. From 1977 to 1979, he worked at Canadian Marconi in the design and development of mobile radio communications systems. He joined the Communications Research Centre in Ottawa, Canada, in 1979, where he conducted research in the areas of Teletext, data transmission, television channel characterisation and digital mobile radio. Since 1986, he is involved in research in HDTV, video processing, video compression and 3D video. He has been involved in several standardization bodies such as the ACATS, ATSC, MPEG and ITU-R. He can be reached at andre.vincent@crc.ca.


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm