datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Embedded vision: FPGAs’ next technology opportunity

Brian Dipert, Embedded Vision Alliance, José Alvarez, Xilinx, and Mihran Touriguian, Berkeley Design Technology, Inc.

7/2/2012 10:56 AM EDT

THE FPGA OPPORTUNITY: A CASE STUDY
A diversity of robust embedded vision processing product options exist: microprocessors and embedded controllers, application-tailored SoCs, DSPs, graphics processors, ASICs and FPGAs. An FPGA is an intriguing silicon platform for realizing embedded vision, because it approximates the combination of the hardware attributes of an ASIC—high performance and low power consumption—with the flexibility and time-to-market advantages of the software algorithm alternative running on a CPU, GPU or DSP. Flexibility is a particularly important factor at this nascent stage in embedded vision’s market development, where both rapid bug fixes and feature set improvements are the norm rather than the exception, as is the desire to support a diversity of algorithm options. An FPGA’s hardware configurability also enables straightforward design adaptation to image sensors supporting various serial and parallel (and analog and digital) interfaces.

The Embedded Vision Alliance is a unified worldwide alliance of technology developers and providers chartered with transforming embedded vision’s potential into reality in a rich, rapid and efficient manner (see sidebar). Two of its founding members, BDTI (Berkeley Design Technology, Inc.) and Xilinx, partnered to co-develop a reference design that exemplifies not only embedded vision’s compelling promise but also the role that FPGAs might play in actualizing it. The goal of the project was to explore the typical architectural decisions a system designer would make when creating highly complex intelligent vision platforms containing elements requiring intensive hardware processing and complex software and algorithmic control.

BDTI and Xilinx partitioned the design so that the FPGA fabric would handle digital signal-processing-intensive operations, with a CPU performing complex control and prediction algorithms. The exploratory implementation described here connected the CPU board to the FPGA board via an Ethernet interface. The FPGA performed high-bandwidth processing, with only metadata interchanged through the network tether. This project also explored the simultaneous development of hardware and software, which required the use of accurate simulation models well ahead of the final FPGA hardware implementation.

PHASE 1: ROAD SIGN DETECTION

This portion of the project, along with the next phase, leveraged two specific PC-based functions: a simulation model of under-development Xilinx video IP blocks, and a BDTI-developed processing application (Figure 1). The input data consisted of a 720p HD resolution, 60-frame/second (fps) YUV-encoded video stream representing the images that a vehicle’s front-facing camera might capture. And the goal was to identify (albeit not “read” using optical character recognition, although such an added capability would be a natural extension) four types of objects in the video frames as a driver-assistance scheme:
• Green directional signs
• Yellow and orange hazard signs
• Blue informational signs, and
• Orange traffic barrels


Figure 1: The first two phases of BDTI and Xilinx’s video-analytics proof-of-concept
reference design development project ran completely on a PC.

The Xilinx-provided IP block simulation models output metadata that identified the locations and sizes of various-colored groups of pixels in each frame, the very same metadata generated by the final hardware IP blocks. The accuracy of many embedded vision systems is affected by external factors such as noise from imaging sensors, unexpected changes in illumination and unpredictable external motion. The mandate for this project was to allow the FPGA hardware to process the images and create metadata in the presence of external disturbances with parsimonious use of hardware resources, augmented by predictive software that would allow for such disturbances without decreasing detection accuracy.

BDTI optimized the IP blocks’ extensive set of configuration parameters for the particular application in question, and BDTI’s post-processing algorithms provided further refinement and prediction capabilities. In some cases, for example, the hardware was only partially able to identify the objects in one frame, but the application-layer software continued to predict the location of the object using tracking algorithms. This approach worked very well, since in many cases the physical detection may not be consistent across time. Therefore, the software intelligent layer is the key to providing consistent prediction.

As another example, black or white letters contained within a green highway sign might confuse the IP blocks’ generic image-analysis functions, thereby incorrectly subdividing the sign into multiple-pixel subgroups (Figure 2). The IP blocks might also incorrectly interpret other vehicles’ rear driving or brake lights as cones or signs by confusing red with orange, depending on the quality and setup of the imaging sensor used for the application.




Figure 2: Second-level, application-tailored algorithms refined the metadata coming from the FPGA’s video-analysis hardware circuits.

The BDTI-developed algorithms therefore served to further process the Xilinx-supplied metadata in an application-tailored manner. They knew, for example, what signs were supposed to look like (size, shape, color, pattern, location within the frame and so on), and therefore were able to combine relevant pixel clusters into larger groups. Similarly, the algorithms determined when it was appropriate to discard seemingly close-in-color pixel clusters that weren’t signs, such as the aforementioned vehicle brake lights.




mkr

7/5/2012 7:36 AM EDT

There surely is a lot of applications where embedded vision can really shine. I come from the academia, where I'm working mainly on vision for mobile robotic and surveillance applications. In the field of robotics, the dominant trend is to pack the machine with a PC and let it handle all the algorithmic heavy lifting. There are however emerging applications where using a PC as we know it is a dealbreaker - think UAVs. As for surveillance - at present the dominant paradigm is centralized processing, using some server or even a server cluster. The image data from cameras has to be transfered for processing, putting a large pressure on the communication infrastructure. Sometimes the constraints presented by the communication infrastructure are a brick wall - a complete system redesign is necessary to top over it (or go around it). A natureal solution to this problem is in-place processing.
Programmable logic really shines when it comes to processing of local image information, e.g. using the sliding window approach. Our stream processors for image filtering and feature detection and matching can crunch hundreds of VGA frames per second. Combine it with a nice, low power embedded processor and you get a system for (almost) every job. And with Zynq, you get it all in one package. The only problem is that the development is significantly more complicated than it is the case with pure software designs.

Sign in to Reply



Dr DSP

7/10/2012 5:44 PM EDT

The topic of multi-camera analytics perhaps deserves some additional discussion. It should be possible to combine multiple views from multiple cameras to more precisely determine acceleration, relative position, and object characteristics (a 'person' in a sign vs. a real 3D person). Any features that support these requirements?

Sign in to Reply



anne-francoise.pele

7/16/2012 11:25 AM EDT

Do not hesitate to tell us about your real-world experiences, your on-going projects your achievements, etc. in the field of embedded vision.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)