Design Article

IMG1

Streaming video with "TimeSlice" multicore-friendly processing eliminates dropped frames

David Workman
Kulabyte

10/12/2007 3:00 AM EDT

Technology for streaming video on the Internet is now twelve years old. Video compression has been around longer, of course, but 1995 was the year that launched companies like Vivo, VDO.net, and VXtreme. Though long forgotten, these companies pioneered streaming technology with low bit-rate codecs that streamed postage-stamp sized video over a 28.8Kbps modem connection at a whopping 10 frames per second.

Fast forward to 2007. What has changed? Well, what hasn't? Video quality has improved dramatically and broadband is nearly ubiquitous, but basic streaming video concepts have improved little since 1995. Streaming technology still has some inherent limitations that have kept it from reaching the potential of becoming a truly mainstream vehicle for delivery of video to consumers worldwide.

Streaming limitations
Many of the problems with streaming video have to do with the methods that are available for encoding: Constant Bit Rate (CBR) and Variable Bit Rate (VBR). Each method has its advantages, but neither is ideal. CBR works well for the delivery of streaming video over a network and can produce acceptable quality. VBR, on the other hand, produces higher-quality video, but does not stream as well over a network, because the bit rate is, well, variable. VBR encodes complex scenes better than CBR, but it requires more bandwidth to stream -- sometimes a lot more. VBR-encoded video is much more likely to contribute to network congestion, especially when streamed over bandwidth-constricted networks, like those used in ADSL and cable systems. Another problem is that encoding video requires a lot of computer processing power, regardless of whether you use a lightweight or heavyweight video codec (a lightweight codec being one that takes less processing power to encode, a heavyweight codec is one that takes more processing power). Any computer can encode on-demand video from a file, because processing does not occur in real time; a heavyweight codec simply takes longer to encode than a lightweight codec. However, if you need to encode a live stream, in which video is ingested from a capture card, processed, and immediately streamed to the network, a lightweight codec is your only choice.

Encoding problems
Kula Media Group has solved these problems with KulaByte: an encoding process that uses the best of both CBR and VBR to encode video in real time, using almost any codec. To understand how the KulaByte process works, we need to look at the problem in a little more detail.

Video, by its very nature, is dynamic. Video codecs reduce bit rate by reducing redundant information within a frame, and from one frame to the next. If there is a lot of redundancy between frames (low motion), the video can be encoded without requiring excessive bandwidth. For example, it would be fairly easy to encode a scene with two young lovers calmly walking in a park. They sit on a park bench, with water slowly lapping in a pond behind them, and maybe a bird passing by as the only motion. However, if an out-of-control car careens past them in a fiery explosion of flames and flying metal, the video would suddenly become a great deal more difficult to encode. Because of the high amount of motion, there would be a lot of change and little redundancy from frame to frame, resulting in a much higher bit rate.


Figure 1: Shows how the bit rate increases during the scene.

Next: VBR and CBR encoding techniques, Two-pass CBR (Go to page 4 to skip tutorial and see how KulaByte works)
VBR encoding
If the scene was encoded with the VBR method, the codec would simply increase the bit rate as the action increased. When configuring VBR, you select a quality level, known as a Q value, which is typically based on a scale of zero to 100. Every frame of video is encoded at that quality level, and because the amount of motion determines the bit rate of a sequence of frames, the resulting overall bit rate maps to the complexity of a scene.

In other words, selecting a higher or lower Q shifts the bit rate of a stream up or down accordingly, as shown in Figure 2.


Figure 2: Higher or lower Q shifts of the bit rate

In most cases, frames will not be dropped with VBR encoding; every single frame of video is guaranteed to be encoded, based on the specified Q level. However, frames and data might be dropped when the VBR-encoded video is streamed on a network. Unless a user is streaming over a very reliable, extremely high-bandwidth connection, the high-bit-rate peaks in a VBR stream can result in dropped packets, which then result in dropped frames and jerky video.


Figure 3: Shows what happens when the available bandwidth is not high enough to handle the peaks in a VBRencoded stream.

Eleven years of experience has shown that the best way to encode video for streaming over the Internet is with CBR. So how does CBR encoding handle the scenario above?

Where VBR varies the bit rate to maintain constant quality, CBR varies quality to maintain a constant bit rate.

CBR encoding
CBR uses a buffer to help the codec smooth out variations in the complexity of the video. A larger buffer results in better potential for high quality video. Typically, a five-second buffer is adequate for streaming over a network, but buffer time can be as short as a half-second or as long as 30 seconds for other delivery methods, such as DVD. A smaller buffer is required for situations in which the wait time must be minimized when switching between streams, such as channel-switching in an IP Television (IPTV) scenario. When configuring CBR encoding, a bit rate is specified along with a buffer size (in seconds). During easy-to-encode, low-motion scenes, the buffer may remain fairly empty. However, during complex sequences when the buffer fills up, the codec must make difficult decisions about how to maintain the bit rate. Typically, the codec starts by lowering the quality of the video. However, when video complexity reaches the point where the buffer is at its upper size limit, the codec must begin to drop frames. Most codecs provide a slider that enables the user to find the best compromise between frame quality and dropped frames.


Figure 4: Shows how buffer size varies while encoding the above scenario using CBR.

Before delving into the KulaByte process, let's look at a few more encoding modes that have been developed over the years. Two-pass CBR and two-pass VBR are available with most codecs; Peak-constrained, two-pass VBR is available on only a few.

Next: Two-pass CBR, Peak-constrained, two-pass VBR
Two-pass CBR (2PCBR)
With 2PCBR, the encoder goes through a video twice. On the first pass, the codec analyzes the video frame by frame and calculates all of the motion vectors. On the second pass, the video is encoded using the motion vector data from the first pass.

In theory, the two-pass approach produces higher video quality, because motion vector calculations for a frame can include information about previous, as well as subsequent frames. In practice, however, the quality improvement is negligible and encoding takes nearly twice as long. Also, this process cannot be performed in real time, so it is not an option for live streaming.

Two-pass VBR (2PVBR)
With 2PVBR encoding, an average bit rate is specified. Unlike 2PCBR, the 2PVBR mode does not use any of the data from the first pass during the second pass. Instead, the encoder encodes the video once and guesses at what the required Q value should be. If, after the first pass, the file size is larger than it would have been if it had been encoded using CBR at the specified bit rate, the encoder decreases the Q value accordingly; if the file size is smaller than it would have been, the encoder increases the Q value.

The file is then re-encoded with a new Q value during the second pass to match (as closely as possible) the file size of what a CBR encode would produce. The formula to calculate the file size is easy enough:

Filesize (bytes) = (Bitrate (bps) / 8 x Time (in sec))

If, after the first pass, the encoder guesses correctly and the size of a VBR file is the same as that of a file encoded with CBR, then the average bit rate is the same for the two files. However, you are still left with the problem that the VBR file cannot be streamed over a network.

Peak-constrained, two-pass VBR
This mode is an attempt to solve the peak bit rate problem of VBR. It is a hybrid of CBR and VBR: based on 2PVBR, but using a buffer like CBR. You specify an average bit rate, a peak bit rate, and a buffer size. During complex sequences, the codec increases the bit rate, as it would with VBR. When the bit rate reaches the peak rate, however, the codec begins to fill the buffer, as it would if it was encoding in CBR mode. As the buffer fills, the codec decreases video quality or drops frames to maintain the bit rate below the peak rate.

In practice, peak-constrained two-pass VBR does not work as well as one may think. Extensive field testing has shown that if the peak bit rate value is not at least three times higher than the average bit rate value, the codec will actually drop more frames than if CBR had been used. This means that, on average, this method uses only about one-third of the available bandwidth, because the peak bit rate cannot be higher than the available bandwidth, and the average bit rate must be one-third of that.

The peak-constrained, two-pass VBR mode may be used on fixed media, such as DVD. In these cases, a lower average bit rate is required in order to fit content on a disc, but the peak transfer rate of data from the disc is substantially higher, though limited. However, this mode is not suitable for streaming.

Next: An all new encoding mode: KulaByte
An all new encoding mode: KulaByte
KulaByte, a patent pending process developed by Kula Media Group, solves these issues and allows almost any codec to stream high-quality video over a fixed bandwidth network without any dropped frames. Being codec independent means that KulaByte works with codecs such as WM9 (VC-1), Flash, ON2, MPEG2, MPEG4, H.263, and H.264. KulaByte even enables live encoding using codecs that were not engineered for real-time use.

KulaByte works by dividing a file into short TimeSlices, and independently encoding each TimeSlice with twopass VBR. The TimeSlice can be anywhere from less than a second up to the size of the targeted client size buffer, but is typically around 5 seconds.

With traditional one- or two-pass VBR encoding, the Q value is the same throughout an entire video stream. With KulaByte, a specific 2PVBR bit rate is applied individually to each TimeSlice, and then the Q of each TimeSlice is normalized to meet the specified output bit rate. The result is a CBR compatible stream that is guaranteed to not drop any frames!


Figure 5 shows the bit rate normalization of each TimeSlice.

KulaByte with a Client Buffer
There are times when some variability in the stream is acceptable, or even desired. Because CBR and VBR are both used in this hybrid model, the KulaByte process is very flexible.

In most server/client streaming products, there is a client side buffer specifically meant to smooth out network fluctuations. This buffer can be intelligently utilized by the KulaByte process. The calculation below, which is part of the patent pending KulaByte process, can be used to generally determine how to fill the client side buffer, and can be used to ensure that the variable bit encoding process does not exceed the overall predetermined bit rate.


Figure 6: Varying bit rates according to TimeSlices

A variable encoding bit rate for a particular TimeSlice to be delivered to the client may be calculated. In Figure 6, The TimeSlices labeled 1, 2, 3, and 4 have been previously encoded at varying bit rates of 700 Kbits/second, 300 Kbits/second, 200 Kbits/second, and 1800 Kbits/second, respectively. In order to find an encoding bit rate for TimeSlice 5, the following equation may be used:


where x is the variable encoding bit rate for a current TimeSlice that needs to be compressed (e.g., segmented media file 5 of Figure 6), y is equal to the client buffer size (here, measured in units of seconds), z is the number of previous segments that can fit in the client buffer with the segment that needs to be compressed (here, 4, since 4 previous segments plus the segment being compressed will fit in the client buffer), and the targeted client bandwidth is the value discussed above, which can be chosen based on what the desired CBR bit rate would be.

In the example of figure 6, if we assume the targeted client bandwidth is 700 Kbits/second, the use of the above equation yields: [(700+300+200+1800) +x ]/5 is less than or equal to 700. Solving for x, one finds that the 5th segmented media file can be encoded at a bit rate up to about 500 Kbits/second without exceeding the client buffer.

Note that in this example, the variable encoded bit rate for segmented media file 4 is much greater than the targeted bandwidth of the client. This is possible due to a re-allocation of bit rate bandwidth not used during the encoding of prior TimeSlices.

How does this affect streaming quality over a network? In general, a traditional CBR stream is most efficient only when the desired bit rate of the video is very close to the available bandwidth. You would think that any variability in the stream could push the bit rate over the top, resulting in dropped packets, and a bad experience for the viewer. However, the KulaByte process is both network and viewer friendly. Field testing has shown that the higher bandwidth TimeSlices do get delivered more slowly to the client, but the client is playing out of its buffer during this time anyway. The overall average bit rate is maintained by the equation above, the client buffer does its job in smoothing out the network peaks and valleys, and the visual result is high quality video images with no dropped frames.

Next: Bandwidth savings and multi-processor design
Bandwidth savings
The KulaByte process creates a CBR-like stream. However, because the encoding is based on VBR, video quality is roughly 20 percent better than if the video had been encoded with just CBR. Using identical codecs and settings, internal tests have shown that the PSNR (Peak Signal to Noise Ratio) of an 800Kbps stream encoded with the KulaByte process is equal to that of a standard CBR stream encoded at 1Mbps.

Using KulaByte with multiple processors
In addition to improving the quality of encoded video, the KulaByte process provides a better way to configure your encoding computer to use multiple processors.

Typically, when a codec is used in a multi-processor environment, each video frame is split into horizontal tiles, which are then encoded with different processors. (Figure 7 shows how this works with a dual-processor or dual-core encoder.) The problem with this approach is that motion vectors cannot span the junction between the tiles, and as a result the encoding process is not as efficient. This results in a slight reduction in PSNR when using two processors. The problem worsens with four processors; instead of one horizontal line across the picture, there are three (see Figure 8). As you can see, this introduces a problem of scalability; as you add more processors to reduce encoding time, you lower the quality of your video.


Figure 7: With dual-processor or dualcore encoder.


Figure 8: With four processors.

Another problem is the amount of development time it takes to build an encoder that works properly with multiple processors. The encoder has to be engineered specifically to divide the picture into tiles, and then send the right data to each processor. For this reason, many current codec implementations are only single-threaded. This is fine for a lightweight codec, but it means that many heavyweight codecs cannot be used for real-time streaming, because there is no single processor that can handle the load.

KulaByte uses a different approach to multi-processor encoding. Each processor runs a different instance of the encoder, and successive TimeSlices are sent to each encoder in turn. There is no limit to the number of processors that can be used, so scalability is not an issue. Kula Media Group engineers have shown that live encoders can run effectively in a multi-processing environment with no modification to the underlying single-threaded codec software at all.

Even though this approach requires that a new I-frame be generated at the beginning of each TimeSlice, there is only a slight impact on PSNR. The total impact is less than with tiling, because the TimeSlice method affects the motion vector calculation only for the frame of video at the beginning of each TimeSlice, compared to every frame when tiling. As a result, the PSNR value does not decrease as more processors are added. Regardless, the problem is easily mitigated by starting each TimeSlice on a frame that would be encoded as an I-frame anyway.

Next: Live streaming with KulaByte, High Speed encoding
Live streaming with KulaByte
As you can see, a real-time video encoder is easily implemented with KulaByte by using a computer with two or more processors, or even a single dual-core system. But how does multi-processing and the TimeSlice process affect stream latency and channel-switch time?

Because encoding is based on 2PVBR, the stream itself does not use an encoder buffer. Therefore, an end-user can switch from one live stream to another without any increase in channel-switch time. This is very important to most end-users.

However, even though channel-switch latency is very low, the actual latency of the stream does increase with each additional processor. Figure 9 shows how live video is encoded with a four-processor computer.


Figure 9: Live video encoded with four processors.

As you can see, each processor adds the length of one TimeSlice to the overall stream latency, which is defined as the time beginning when a live video frame enters a capture card and ending when the frame displays on a clients' monitor. For example, assuming a TimeSlice of 5 seconds, a streaming broadcast of a TV show would be ten seconds behind true "live" with a dual-processor or dual core encoder and 20 seconds behind with a quad-processor encoder. Typically, latency is not an issue in streaming applications and as mentioned above, once the stream has commenced, channel-switch time is the same as with CBR encoding.

Live video applications using static file-like environment
The KulaByte encoding model opens up a whole new world of possibilities when handling live video. Because KulaByte treats a live stream as individual TimeSlices, anything that you could possibly do to a static movie file, you can now put towards a live stream.

So virtually any effect, preprocessor or codec can now be spread across multiple processors and encoded live when the KulaByte process is used. What this means for Live Encoding is more efficient streaming and network utilization, along with better processor utilization.

High speed encoding
Because KulaByte was engineered to take full advantage of multiple processors, multi-core processors and multiple DSP's, drastic reductions in encoding times are seen for static video files. For instance, a 120 min. 720p video can be encoded in 370 minutes, instead of 500 minutes on a 4 core server. That's an improvement of 26 percent in encoding speed, plus the improved PSNR, when using KulaByte. The ratio increases with every core added. Any HD-DVD or Blu-Ray authoring station equipped with KulaByte receives clear gains.

About the author
David Workman is the Chief Technology Consultant at Kulabyte. He was a Product Manager at Thompson Grass Valley for 10 yea. Most recently, David was the QA Manager for Video Codecs at Microsoft Corp., working on Windows Media Technologies for 8 years. He holds an associates degree in science and electronics from Sierra College in Auburn, California, and is now semi-retired and living in Ecuador. He can be reached at David@ecuadorhomesonline.com .


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Most Popular

Product Parts Search

Enter part number or keyword
PartsSearch


FeedbackForm