Design Article
Video encoding with low-cost FPGAs for multi-channel H.264 surveillance
Suhel Dhanani, Senior Manager, Altera
and Vicenzo Liguori, Director, Ocean Logic Pty Ltd.
11/28/2008 4:15 PM EST
This article first lays out the architecture advantages of FPGAs for low-cost, yet high-performance video processing applications and then shows how these advantages translate into a real-world application in the rapidly expanding field of video surveillance systems.
Advantages of implementing video processing in low-cost FPGAs
Today's low-cost FPGAs feature a host of silicon features that enable high performance signal processing -- abundant multipliers, fast fabric performance, and large amounts of on-chip memory (see Table 1). This makes low-cost FPGAs an ideal platform to implement an emerging class of cost-sensitive, yet high quality image processing applications.
A good example is the family of Cyclone III FPGAs that are fabricated on advanced 65nm process technology and has abundant multiplier, memory, and logic resources that enable them to implement algorithmic-intensive applications such as video and image processing.

Table 1: Resources for Video and Image Processing in Cyclone III FPGAs
Power is an increasingly important consideration for many system designers. Built on the TSMC 65-nm low-power process technology, these FPGAs have additional silicon and software optimizations to offer an extremely low power consumption number.
Figure 1: The Typical Power Consumed by Cyclone III FPGAs for a Range of Density and Performance
While power consumption is very design dependant, the typical power consumption of a mid-range device (with over 50,000 logic elements) is lower than 1W. The static power consumption is less than 1/10th of this number (i.e. less than 100mW as shown in Figure 1).
This kind of low power consumption is critical in low-cost systems such as surveillance systems where cooling systems add to the cost of the system.
Abundant DSP resources on a low-cost, low-power fabric coupled with industry standard design flows allows high performance video applications to be implemented in a cost-effective manner on such FPGA platforms.
Next: FPGAs vs. DSPs
FPGAs vs. DSPs
New surveillance systems are rapidly moving from D1 quality (standard definition) video to high definition video -- resulting in greatly increased complexity and encoding performance requirements.
While inexpensive digital signal processors (DSPs) can handle the processing and encoding for a single channel of D1 video; the processing complexity involved with multiple channels or even for a single channel of HD video requires FPGAs. This is where the parallel signal processing resources of an FPGA are critical. The encoder designed for these systems must be flexible enough to handle different resolutions and a different number of channels.
Not only can the FPGA fabric implement the necessary video encoders at the performance level to encode multiple channels and different image resolutions, but also other functions such as an Ethernet MAC, camera interface, and custom memory controllers can be easily integrated allowing a lower total system cost. Custom functions such as these cannot be integrated in a standard DSP processor fabric.
Building a high-performance encoder for the surveillance Marketplace
Keeping in mind the unique requirements of the video market, Ocean Logic has designed a powerful, yet ultra compact H.264 encoder solution that is capable of processing a different number of channels and different levels of image resolutions.
The Ocean Logic H.264 encoder core is optimized for the FPGA fabric and provides unmatched device utilization. This video encoding core includes useful features such as bitrate control and it does not require a CPU for encoding. The H.264 encoder core is highly parametrizable - one core can process two D1 video channels @ 30 FPS in parallel or a single HD video channel at 720p @ 24 FPS. The design is highly efficient which allows it to target cost-effective FPGA devices.

Table 2: A Complete Resource Estimate for One Core Running on a Cyclone III Fabric
The design efficiency allows the system designer to integrate additional logic, such as a camera interface and an Ethernet stack typically required in these systems within a mid-sized FPGA device.
Next: Configurable Design, Encoding D1 Video Channels
Configurable Design
The design is also highly configurable allowing system designers to support multiple resolutions using the same design.

Table 3: The Maximum Number of Time-Multiplexed Video Channels Supported by a Single Core Cyclone III FPGA
This design flexibility allows customers to configure the solution to meet the resolution requirements of their system. The core can encode video channels of different resolutions on a frame multiplexed basis, as long as the combined pixel rate is less or equal than the maximum pixel rate listed.
Ocean Logic's H.264 cores can also work in parallel not only to double the number of simultaneous video channels encoded, but they can also be configured to work on the same video channel together in order to support a higher frame rate or image size. For example, two cores working on the same video channel in parallel can support real time encoding of either four channels of D1 video or one channel of 1080p video in the same FPGA.
Encoding D1 Video Channels
A typical application of this design is the simultaneous encoding of two or four D1 video channels coming from different cameras and then broadcasting the encoded video over Ethernet.
Another important application is for unmanned aircraft where the input from different video cameras is used to both remotely control the aircraft and collect intelligence. The encoded bit-stream can then be sent to home base wirelessly.
Figure 2: A Single Encoder Core Encoding a 720p Video Channel at 24 FPS
As shown in Figure 2, a single H.264 core can be used to encode a single channel 720p video stream.
The internal memory of the FPGA is sufficient to create raster to block logic as the input to the core is a YUV 4:2:0 16x16 macroblock. Note that only a single 16-bit data bus is required for the external memory if DDR/DDR2 memory is used.
Next: Encoding Multiple Video Channels
Encoding Multiple Video Channels
The same encoder core can be used to implement a design for encoding two D1 (30 FPS) video channels in parallel, see Figure 3.
Figure 3: A Single Encoder Core Used to Encode 2 Channels of D1 Resolution Video
The core encodes one frame at a time from each video channel. Therefore the frame from the video channel that is not being encoded must be buffered. The size of the buffer depends on whether the input from the two cameras is synchronized, but it is never more than two 4:2:0 frames per channel.
Two instances of the core fit in a medium-sized FPGA and allow the encoding of four D1 video channels - an application that is widely used in many surveillance systems.
Or as shown in Figure 4, the same two encoder cores can work in parallel to support a single channel of 1080p HD video.
Figure 4: Two Cores Working in Parallel on a 1080p Video Stream
Each frame is divided in two slices that are fed to the two cores simultaneously. This is necessary to buffer the incoming video so that the two slices can be read simultaneously and be encoded by the two cores operating in parallel.
Next: Putting it all together
Putting it all together
An entire surveillance system has been built to demo the real-life applicability of this solution. This system includes a camera module from Micron based on the 5MP MT9P031 CMOS sensor and a Cyclone III Development Kit.
Figure 5 shows a system encoding 1080p video at 20 FPS from the camera module and streaming it through Ethernet. The Ethernet stream is then captured, decoded, and viewed using an ordinary PC.
Figure 5: A Typical Block Diagram of a Complete FPGA-based System
Since the camera sensor outputs raw video data in Bayer format, the output has to be pre-processed for Bayer interpolation, white balancing, RGB to YUV color space conversion, normalization, gamma correction, and sharpening. All of this pre-processing can be integrated very efficiently in the FPGA fabric.
The processed YUV 4:2:0 data is then stored in a frame buffer. Video block data is then fed to the two cores working in parallel.
The two separate network abstraction layers (NALs), as described in the H.264 specification, produced by each core are then merged and sent to the Ethernet MAC.
Next: Hardware Reference Design Board
Hardware Reference Design Board
The complete design, including the camera processor, dual H.264 cores, bit-stream merger, and Ethernet MAC (OpenCore) uses about 55K LEs, a little over 2 Mbits of on-chip memory and 32 embedded multipliers of the EP3C120 device -- less than half of the device.
The total measured power consumption is approximately 1.8 W (~1 W for the internal logic and ~0.8 W for the I/O) when running at 85 MHz.
Figure 6: Hardware Reference Design showcasing an H.264 Encoder on a 1080p Video Stream
Conclusion
This design approach enables the implementation of an advanced, high-performance video application on a low-cost FPGA fabric. Such implementations are critical to meet the cost and power constraints today's video systems generally require. A complete working quad-channel surveillance system can be implemented using only a single mid-range FPGA that can be a significant technology enabler for this important growing market.
About the authors
Suhel Dhanani is a Senior Manager in the software, embedded and DSP marketing group. Mr. Dhanani is responsible for DSP product marketing. He has over 15 years of industry experience in semiconductors -- with both large companies such as Xilinx and VLSI Technology as well as with Silicon Valley startups including Anadigm and Tabula. Mr. Dhanani has completed a graduate certificate in Management Science from Stanford University and holds M.S.E.E. and M.B.A. degrees from Arizona State University.
Vincenzo Liguori is the Director of Ocean Logic Pty Ltd. Mr. Liguori graduated in electrical engineering from the University of Naples, Italy in 1989. He has worked for Marconi Italiana and as researcher for Canon Information System Research in Australia. In 1996 he obtained his Master of Research in image compression at Sydney University, Australia. That year he founded Ocean Logic Pty Ltd, a provider of encryption and multimedia cores.
Both authors can be reached by sending a message to newsroom@altera.com.









