Design Article
Surveillance IP camera design with intelligent encoding for reduced bandwidth, higher quality
Mark Oliver, Director Product Marketing, Stretch, Inc. and
John Monti, Vice President, Marketing and Business Development, Pixim, Inc.
12/21/2007 3:30 PM EST
One of the fundamental challenges facing anyone attempting to work with video surveillance has always been the size of the video itself. Capturing video produces large amounts of data posing problems both in transportation of the data from one place to another and in subsequent storage. Analog video engineers have for years struggled to hone techniques designed to reduce the size of the captured data without introducing any appreciable loss in quality. Analog systems, however, generally lack the intelligence to effectively tailor these techniques in real time and as such only scratch the surface of what is possible.
Fortunately, with advances in digital video technology, a new set of real time tools can be brought to bear on the video stream. Software configurable processors targeted at video applications are allowing closer coupling between the intelligence of the video system and the compression stages. The result will be a quantum leap in the quality and features of digital video surveillance systems.
When most of us think of digital video compression techniques, we think of the CODECs standardized by the Motion Pictures Experts Group (MPEG). These include MPEG2 of DVD and digital TV fame, as well as MPEG4 part10 (H.264), generally considered to be the natural successor to the aging MPEG2 standard. These CODECs, rather than being rigid in their application, are best thought of as a tool bag of possible techniques that can be applied to a video stream to perform the desired compression. Which tools are used and how they are applied can significantly affect the quality and size of the compressed video stream.
In video surveillance applications, the type of video "footage" captured is generally very different from that captured for television or for movies. As a result, the tool selection made by a surveillance encoder can be very different from a broadcast encoder. A surveillance camera might, for example, be monitoring a hallway in an office building. In this case, the hallway might be deserted from six in the evening until eight the following morning, and be similarly quiet during the weekend. The encoder, therefore, can use different criteria to select appropriate tools for the compression. Tools and techniques that would be infeasible for other video applications might yield perfectly acceptable results for the quiet scene observed 80 percent of the time. Enter the Intelligent Encoder.
The Intelligent Encoder
The Intelligent Encoder consists of tightly coupled analytics and compression engines. The analytics engine is used to examine the scene and determine if any pre-selected criteria are met. Criteria might include the presence of motion, the absence of motion, sudden changes in light level or rapid scene changes. The results of the analysis are used to configure the encoder engine for optimum quality and compression levels based on the dynamics of the scene. The ability to radically change encoder parameters based on the scene dynamics results in higher average compression ratios. This lowers bit rates and makes more efficient use of storage or transmission bandwidth. Figure 1 shows a block diagram of a typical intelligent encoder.

Figure 1: Block Diagram of an Intelligent Encoder
The maximum efficiency of an Intelligent Encoder is obtained by bounding encode parameters in terms of quality, bit rate, resolution, or frame rate, and defining a time period over which the defined bounds are to be applied. In this way, the encoder itself is able to optimize the consumption of the "bit budget" based on the scene dynamics and the level of interest that the observer is likely to have in the encoded stream.
Next: Setting bounds on the encode parameters, Constant Quality bit rates
When setting bounds on the encode parameters of an Intelligent Encoder, advantage can be taken of the volatility in bit rates. Conventional encoders would create bit streams of reasonably constant rate and would require provisions to be made to ensure that the network did not become saturated. This would normally involve setting low quality levels and dedicating network bandwidth to video traffic. The constrained bit rates of this approach work well for low motion scenes, but when high motion is encountered (typically the scenes of interest for a surveillance application) quality suffers as the encoder struggles to represent the rapidly changing scene while staying within its bit budget.
With an Intelligent Encoder, the normal operating bit rate naturally tends to fall to lower levels. When the analytics engine detects a triggering event, the encoder uses the bits it needs (up to its prescribed bound) to accurately represent the motion with the highest possible quality. After a short period of typically just a few frames, the bit rate can be returned to the normal operating level.
The result is an encoded stream that frugally uses bandwidth in quiescent operation, and is still able to capture trigger events with maximum quality. This ability brings up the concept of constant quality. In a constant quality encoder, it is the desired quality that is specified rather than the bit rate. The intelligence within the encoder can be used to adjust encode parameters to deliver the required quality level.
Constant Quality Bit Rates
In surveillance applications, quality is maintained regardless of the scene dynamics, resulting in good motion performance with low quiescent period bit rates.
Figure 2: Bit Rates Produced by Different Encoding Schemes
Figure 2 shows the bit rate over time of a video clip encoded with three different encoding techniques. The red line represents a constant bit rate encoder typically used when a known amount of bandwidth has been allocated to the video. The encoder strives to fill the available bandwidth by changing the encoded quality.
The blue line represents a typical variable bit rate encoder where a target bit rate is defined, but the actual encoded bit rate is allowed to vary on either side of the target as a result of the encoding process.
The orange line represents the output of an intelligent encoder set for constant quality. Here, the encode parameters are changed in real time to account for scene dynamics. Lower quiescent period bit rates are achieved during periods of low activity and high quality is maintained when the scene rapidly changes. The result is a dramatic reduction in the average bit rate generated by the encoder.
Next: Sample image comparison, reducing storage requirements
Figures 3 and 4 are single frames taken from a video sequence encoded with constant bit rate and then constant quality. In the video clip, a large white object moves in front of the camera temporarily blinding it. Both images are of the same frame of video taken just as the object moves away and the background is revealed. This simulates a vehicle or person moving in front of the camera. In both frames, the instantaneous bit rate can be seen in the bottom left of the image after the "BR" label.
The constant bit rate encoder is set to 1.5Mb/S and will vary little around that figure. As a result, as large amounts of the image change, the quality of the encoded stream must be reduced. The result is a loss of video quality that occurs during the event of interest. The intelligent encoder, on the other hand, is given the flexibility to temporarily increase the bit rate to a level just high enough to encode the scene with a constant level of quality. Here we see an instantaneous bit rate of about twice that of the constant bit rate stream. This instantaneous rate is sufficient to maintain excellent video quality. The duration of the increased rate is limited to just the period of highest motion. The result is an overall reduction in the number of bits required to encode the scene.

Figure 3: Frame of High Motion Video Encoded With Constant Bit Rate

Figure 4: Frame of High Motion Video Encoded with Constant Quality
Use models
When considering the specific application of video surveillance, further advantage can be taken of the use model of the captured video. In our original hallway example, normal office traffic might be captured and compressed to 1.5Mb/S at 30fps. During evening or weekend periods, the absence of motion within the scene might be used to trigger the encoder to reduce the resolution to CIF (one-quarter of the D1 resolution) and to drop the frame rate to two or three frames per second. This would be sufficient resolution and frame rate to assure a night watchman that all was well in the building. The corresponding data rate, however, would drop from 1.5Mb/S to 25Kb/S increasing the effective storage capacity of the system by a factor of 60. Further configuring the CODEC to increase quantization parameters or to select different tools from its MPEG tool bag might further reduce the bit rate by 30 percent to 50 percent.
Trigger criteria might also be defined to be a function of time as well as scene dynamics. In our example, motion occurring during weekend hours might be considered more suspicious and result in the encoder selecting the full resolution of the image sensor and perhaps capturing high definition video streams at 30fps to preserve the best possible video evidence.
In the preceding examples we see how providing an IP camera with sufficient processing power allows analytic algorithms and the video CODEC to be tightly coupled. The resulting intelligent encoder is able to set its own bit rate within a widely defined window producing a highly efficient camera implementation. In surveillance applications, long periods of very low bit rates are produced. These more than compensate for very short periods of higher bit rates when required by scene dynamics. The reduced bandwidth consumption of the resulting stream and lower storage requirements yield economies in installation costs and operating expenses.
In future generations of the Intelligent Encoder, enhanced analytic functions might be used to identify regions of added interest such as faces or license plates. The encoder might be configured to increase the level of constant quality in just these regions as they are tracked through the scene. The result would be the optimum tradeoff of quality and bit rate set not by the dynamics of the scene but by the specifics of its content.
Next: Part Two: The importance of image quality when applying video analytics
Part Two: The importance of image quality when applying video analytics
Video analytics and intelligent encoders promise to propel IP video cameras to new levels of functionality, optimization of bit rate, efficiency and ultimate effectiveness for a wide range of surveillance applications. Advanced software-configurable processors are crucial for bridging the analytics systems with the signal processing and compression functions of the video surveillance process.
No matter how "smart" the encoders become however, they are always dependent upon a high quality video stream from the camera sensor. The quality of the video feed is critical to the accuracy of the analysis by the analytics algorithms and ultimately the efficiency of the encoding process.
In video surveillance applications, especially when intelligent analytics will be applied, "high quality" means more than simply in-focus footage. It means the ability to capture detailed, actionable images no matter what kind of lighting or environmental conditions are present, so the analytics algorithms can operate optimally.
This enables surveillance cameras to fulfill their ultimate purpose, whether it's to alert security personnel so they can avert potential problems, monitor faces of shoppers or vehicle license plates for marketing or transportation planning purposes, or document details of the people, objects and events of a crime scene with sufficient accuracy and clarity to enable the apprehension and/or prosecution of those responsible.
In more technical terms, video surveillance cameras must be equipped with image processing chipsets with:
- Ultra-wide dynamic range (WDR). WDR, which is measured in decibels (dB), refers to a camera's ability to capture image details in both the lightest and darkest portions of a high-contrast scene, simultaneously.
- High total image resolution. High resolution is necessary to distinguish image features and details, including at high magnification or zoom.
- Realistic color rendering. Colors must be accurately displayed even in difficult lighting conditions such as glare, reflections, extremely high contrast, low light, or fluorescent, neon or argon lighting.
- Minimal image artifacts. Video images cannot be obscured or distorted by common problems such as pixel blooming, vertical smear, color aliasing or interlace artifacts.
- High image compression. Higher compression translates into better image quality with smaller file sizes " so DVRs can record with higher frame rate or higher resolution, or both, while maintaining the same recording time. Smaller compressed bit rates also reduce network traffic.
Pixim and Stretch have partnered to develop solutions for the video security industry. Stretch's IP camera Reference Design Kit (RDK), an advanced-feature, multi-standard intelligent network camera seamlessly supports Pixim's image sensors. Powered by the Stretch S6105 software configurable processor, the IP Camera RDK can perform H.264 encoding on 30 frames per second of images from the Pixim sensor and still have 75 percent of the processing cycles available for the integration of third-party video analytics or the addition of other advanced features. Designed into over 100 cameras worldwide, Pixim's patented Digital Pixel System (DPS) silicon and software technology delivers superior image quality even in variable, difficult lighting conditions.
About the authors
Mark Oliver is the Director of Product Marketing at Stretch. A native of the UK, Oliver gained a degree in Electrical and Electronic Engineering from the University of Leeds. During a ten year tenure with Hewlett Packard, Oliver managed Engineering and Manufacturing functions in HP Divisions both in Europe and the US before heading up Product Marketing and Applications activities at a series of video related startups. Prior to joining Stretch, Oliver managed Marketing for Video and Imaging within the DSP Division of Xilinx. He can be reached at moliver@stretchinc.com.
John Monti is a founding executive of Pixim, Inc. and VP of Sales and Marketing, and holds two U.S. patents. He has a bachelor's degree in electrical engineering from Yale University and a master's degree in engineering management from Santa Clara University. He can be reached at monti@pixim.com.




