Design Article
Comment
tsouh
If we implement this with CCD, is there any difference?
ferceaglio
Use FPGAs for stunning surveillance camera images
Judd E. Heape, Altera Corporation
12/16/2010 2:27 PM EST
Unless you've been living under a rock somewhere, you've probably noticed that more and more surveillance cameras are popping up all over the place. These cameras are now commonplace in government buildings, banks, transportation terminals and in traffic intersections around the world. Today's law enforcement agencies, governments and private business owners are capitalizing on the benefits of state-of-the-art surveillance camera products that can provide them with stunning images along with valuable information.
The "stunning images" are made possible by utilizing the latest in high definition CMOS image sensor technology that provides breakthrough dynamic range. The "valuable information" is provided by adding intelligence in the camera that can look for the bad guys by analyzing the video in real time, hence a feature called "analytics". Since both of these features require massive, flexible parallel processing in a small, power-efficient footprint, FPGAs are excellent devices to enable cameras with these breakthrough features.
Figure 1 shows the block diagram of an Internet Protocol (IP) Video Surveillance Camera that includes an FPGA to handle all the image processing algorithms required in the systems: the image sensor pipeline, video analytics and video encoding.

First let's examine how we get stunning images. This focuses on the "Sensor Control and WDR pipeline" portion of the block diagram shown in Figure 1.
Have you ever noticed that some photographs look better than real life? That's because they usually are. The best looking photos have been "doctored" (or more accurately post-processed) by experienced graphic artists using advanced image software like Adobe(r) Photoshop(r). Commonly, images are modified to correct for exposure issues that stem from limitations in the camera itself. Artists can "bump up" exposures electronically in parts of an image that are too dark while leaving the properly exposed portions of the image alone (see Figure 2 for an example). This is very easy to do in a tool like Photoshop, but almost impossible to do "real time" in the field using a conventional digital camera. (This is why most of a professional photographer's time in the field is spent adjusting and optimizing lighting conditions.) Post-processed images (if done correctly) actually can look more "natural" than a raw image, and that's because if exposures are manipulated properly, images can be dynamically compressed to approach a result more like what the human visual system would see. But, more on that later...

The exposure limitation in a typical camera is based on the image sensor - unlike the human eye, a typical image sensor can only reproduce about 72dB of dynamic range. Today's latest image sensors, called wide dynamic range (WDR) or high dynamic range (HDR) sensors, use adaptive non-linear exposure techniques that can exceed 115dB of dynamic range. Depending on the scene, the human eye can perceive up to 160dB of dynamic range. Sensors still have a long way to go to reach the performance of the human eye, but 115dB is much better than 72dB!
Simply put, larger dynamic range numbers are better. 115dB equates to a system that has >19 bits per word while a 72dB dynamic rage equates to almost 12 bits per word. Thus, the dynamic range of a system (neglecting noise) is proportional to the number of bits used to represent the information of interest. In digital imaging, more bits means that more levels of brightness can be reproduced, and the difference between two adjacent brightness levels is minimized, resulting in superior reproduction of fine detail.
So, let's equate this to something that most of us can relate to. Remember how amazed you were to hear an audio CD for the first time after being used to, for years, the limitations of cassette tapes or LPs? (Okay, I'm dating myself here.) Well, today's image sensors are poised to do to video what CDs did to audio back in the early 80s. The key is dynamic range. CDs boosted dynamic range to 96dB while cassettes could achieve at best around 72dB. Sounding "good" in terms of dynamic range really means that you can reproduce the quietest of sounds along with the loudest of sounds, or even the two summed together. The same is true in the video world - with more dynamic range in the image, the details in the darkest of areas along with the brightest of areas can be reproduced without compromising either extreme. See Figure 3 for an example of an image with large dynamic range.

When CDs came out in the 80s, most everyone could hear an immediate difference - that's because the recording medium (e.g. tapes and LPs) were the limiting factor. The rest of the audio system (pre-amplifiers, amplifiers and speakers) were already able to reproduce dynamic ranges much higher than the source medium. Replace the source medium with one of higher quality, and voila! - the entire system goes up in dynamic range.
Unfortunately, the same isn't true in display technology. Thus, even if you increase the dynamic range of the camera (or more accurately the image sensor) to 19 or 20 bits, the displayed output is going to be limited to typical video formats or displays capable of only 8-10 bits of brightness (luminance) information per pixel. So, a camera with superior dynamic range capability will not necessarily look better on a display unless the large number of bits per word is properly compressed and scaled to match the limitations of today's displays. Enter the FPGA!
The FPGA is used to perform a dynamic range compression algorithm very similar to what the graphic artist can do in Photoshop. The algorithm, called iridix(r) http://www.apical-imaging.com/iridix is described as a "space varying dynamic range compression algorithm". Translated, this means that the algorithm looks at every pixel in a given image (or a given video stream) and determines based upon the surrounding local content of the scene if the pixel should be increased or decreased in brightness (or in image sensor speak, this equates to pixel exposure). At the same time, the entire image is transformed from a high dynamic range source (e.g. 20-bits of luminance information per pixel) to an image of standard dynamic range for today's video displays (e.g. 8-10 bits of luminance information per pixel).

Figure 4 shows how this is achieved for 3 selected pixels in a given image - the red dot is the pixel in question and the red circle is the surrounding local content used to determine the brightness transformation or "correction curve" applied to the pixel. As shown, iridix may make vastly different adjustments to pixels depending on the scene content. In the image on the left, the local content in the circle is properly exposed, so the pixels near the red dot will receive a gain adjustment of roughly 1. In the middle image, the local content in the circle is mostly under-exposed, so the pixels near the red dot will receive a gain adjustment of >1, thus increasing the brightness and perceived detail in this region. Finally, in the image on the right, the local content in the circle is mostly over-exposed, so the pixels near the red dot will receive a gain adjustment of <1, thus decreasing the brightness but increasing the detail in this region. Imagine, now doing this for every pixel in a 2 megapixel image at 60 times per second - that's what's required for 1080p60 video, and that's exactly why an FPGA is required!
Your eye (or your visual system) is actually doing something very similar to this, but you're not actually aware of it. You can see details in bright sunlight and shadows simultaneously, as your visual system is integrating the scene and different parts of the scene based upon multiple exposures as you quickly move your eyes to focus on different objects. So, this is why images post-processed in this manner can actually look very natural ... if done correctly.
Are the capabilities of today's image sensors, advanced algorithms and FPGAs at a point where the most talented graphic artists will be put out of business? Time will tell. Remember though, that in an IP surveillance camera, there's no room for a graphic artist. And, there's no way that you can easily, in software, write an algorithm that runs on a DSP or CPU that can perform dynamic range compression on 2 megapixel images 60 times per second. For that, you absolutely need an FPGA.
Next time: Using FPGAs for analytics...
About the Author
Judd E. Heape: As Senior Strategic Marketing Manager in Altera's Industrial Business Unit, Mr. Heape is responsible for defining and developing Altera's industrial architectures and solutions as well as the industrial market development in North America. Mr. Heape joined Altera in July 2008. Prior to joining Altera, he spent eight years at QuickLogic Corporation in roles ranging from FAE for the south-central US region to Senior Director of Systems Engineering. Additionally, Mr. Heape spent nine years at Texas Instruments engaging in FPGA design for DLP products and DSP-based printing engines including two years in Tsukuba, Japan engaging in multi-media research projects (using FPGAs) for the 'C6x DSP. Mr. Heape has five issued US patents and holds a bachelor of engineering degree in electrical engineering from the Georgia Institute of Technology.



thinkski
12/19/2010 4:22 PM EST
Cool article! I wonder, in systems where an FPGA is not available, can the same be done using a GPU in CUDA or OpenCL? How many pixels can the largest FPGA simultaneously compute?
Sign in to Reply
Judd.Heape_#1
1/14/2011 6:49 PM EST
Hi thinkski,
Yes, similar pipelines could be implemented in GPUs or DSPs, however, to achieve full HD resolution at full frame rates (1080p60), this will be a challenge to do in any software implementation. FPGAs/ASICs offer the ability for one to develop a specific gate-level representation of the algorithm, which provides the most efficient solution – the best performance while consuming the least amount of power.
Regards,
Judd
Sign in to Reply
Dr DSP
12/20/2010 7:24 PM EST
It will be nice if in the follow-up article some of the image sensor options are described. Is there a limitation on bandwidth from the sensor and if so how can this be overcome? Can we get RAW data from the sensor at a reasonable bandwidth? How about images from the sensor with variable 'aperture' settings?
Sign in to Reply
Judd.Heape_#1
1/14/2011 7:04 PM EST
Hi Dr DSP,
The two image sensors that the Apical WDR pipeline supports are:
- Aptina's MT9M033 720p60 WDR CMOS image sensor
- AltaSens's A3372 1080p60 WDR CMOS image sensor
WDR sensors are also available from Omnivision as well (e.g. Omnivision's OV10630 720p60 WDR CMOS image sensor), but this image sensor has the ISP built into the device itself (the sensor is a true SoC).
All of these sensors have either parallel outputs or high-speed serial outputs, all of which can be directly interfaced with by today's FPGAs. Both the Aptina and the AltaSens sensors output RAW data via one of these (or both) interface methods.
Your question about showing images at various aperture settings is interesting - this has to be done carefully since AE algorithms in the pipeline and analog/digital gain settings on the sensor can counteract the intent of making these measurements. Both AltaSens and Aptina have hardware available that you can use to capture static images before or after the ISP (in RAW or RGB format respectively). Taking measurements in this manner with various aperture / lens settings can truly reveal the performance of the sensor itself based upon various optical configurations.
Regards,
Judd
Sign in to Reply
namin1243
12/22/2010 8:24 PM EST
Judd,
Great Article...Our British friends at Apical, London will like this too....Nilesh
Sign in to Reply
Judd.Heape_#1
1/14/2011 7:05 PM EST
Hi Nilesh,
Agreed!
Regards,
Judd
Sign in to Reply
ArekZ
12/23/2010 3:41 AM EST
From this: "For that, you absolutely need an FPGA." we can see this was written by an FPGA guy! The others would write: "For that, nowadays you can also use an FPGA."
At the moment a lot of ASICs inside digital camera do similar or even more complicated computations on 20-40Mpix raw images (although they are not reconfigurable). So this article does not show anything new. But I like the use of FPGAs in real time image processing and I use them for this in my work (Actel).
Sign in to Reply
Judd.Heape_#1
1/14/2011 7:10 PM EST
Hi ArekZ,
Yep, I am an “FPGA guy”, as you can see from the company name listed after my name above. And, you are absolutely correct that ASICs could be used for all of these functions as well. What's nice about the FPGA, though, is it gives camera makers a low-cost, low-risk reconfigurable platform that allows them to add their own proprietary features or functions. This flexibility along with the fast time to market attributes of an FPGA make them very popular in surveillance camera platforms, specifically.
Regards,
Judd
Sign in to Reply
Rich Krajewski
12/23/2010 3:59 AM EST
I'm looking forward to seeing video equipment with a knob that says "Stunning Level" on it.
Sign in to Reply
Judd.Heape_#1
1/14/2011 7:06 PM EST
Hi Rich,
Soon! However, the "knob" may be labeled "iridix strength".
Regards,
Judd
Sign in to Reply
Evgeni
12/26/2010 3:03 AM EST
Hi,
I'm wondering if you've seen what Refocus Imaging is doing ( http://www.refocusimaging.com ). Is it something similar to what is described in this article?
Thanks,
Evgeni
Sign in to Reply
Judd.Heape_#1
1/14/2011 7:18 PM EST
Hi Evgeni,
I took a look at Refocus Imaging's website. Looks like cool stuff! The Apical pipeline doesn't perform this same function, however. It appears that Refocus has a very interesting way of capturing a full depth of field in one snapshot, and the resultant images can be rendered using advanced image processing. If I understand their messaging correctly, their technology could be a very promising way to reduce the cost associated with including high quality optics in camera platforms.
Regards,
Judd
Sign in to Reply
hm
12/26/2010 3:25 AM EST
Will you please provide link to reference design with this Altera FPGA? How does it compare with ASIC implementation? How about USB 2.0 and USB 3.0 for interface? Does it achieve 1080p60 or 1080p30 data rate?
What are different IP cores we need and how much do they cost? IS this UVC class and do you provide better application sfotware for tunning the system?
Sign in to Reply
Judd.Heape_#1
1/14/2011 7:28 PM EST
Hi hm,
Please see the this link for more information:
http://www.altera.com/surveillance/
There is a link to the reference design at the bottom of this page.
There is no ASIC implementation for this pipeline today, however Apical can offer this to qualified licensees.
This IP partner of Altera also offers USB 2.0 and USB 3.0 IP cores:
http://www.altera.com/products/ip/ampp/sls/sls.html
The pipeline described in the article above does achieve 1080p60 frame rates (using the AltaSens A3372 sensor) in the Cyclone III or Cyclone IV FPGA.
The reference designs that Altera offers for this pipeline outputs video over DVI or via 10/100 Ethernet after H.264 compression. We don't currently have a reference design that outputs video over USB.
The Apical ISP does come with full documentation and tools to tune the performance of the pipeline to meet specific customer requirements. If you're interested in obtaining pricing on this pipeline, please contact Apical Ltd.:
http://www.apical-imaging.com/
Regards,
Judd
Sign in to Reply
ferceaglio
1/19/2011 8:21 PM EST
Excelente articulo!! Saludos desde Argentina
Sign in to Reply
tsouh
2/10/2011 10:07 PM EST
If we implement this with CCD, is there any difference?
Sign in to Reply