A diversity of robust embedded vision processing product options exist: microprocessors and embedded controllers, application-tailored SoCs, DSPs, graphics processors, ASICs and FPGAs. An FPGA is an intriguing silicon platform for realizing embedded vision, because it approximates the combination of the hardware attributes of an ASIC—high performance and low power consumption—with the flexibility and time-to-market advantages of the software algorithm alternative running on a CPU, GPU or DSP. Flexibility is a particularly important factor at this nascent stage in embedded vision’s market development, where both rapid bug fixes and feature set improvements are the norm rather than the exception, as is the desire to support a diversity of algorithm options. An FPGA’s hardware configurability also enables straightforward design adaptation to image sensors supporting various serial and parallel (and analog and digital) interfaces.
The Embedded Vision Alliance is a unified worldwide alliance of technology developers and providers chartered with transforming embedded vision’s potential into reality in a rich, rapid and efficient manner (see sidebar). Two of its founding members, BDTI (Berkeley Design Technology, Inc.) and Xilinx, partnered to co-develop a reference design that exemplifies not only embedded vision’s compelling promise but also the role that FPGAs might play in actualizing it. The goal of the project was to explore the typical architectural decisions a system designer would make when creating highly complex intelligent vision platforms containing elements requiring intensive hardware processing and complex software and algorithmic control.
BDTI and Xilinx partitioned the design so that the FPGA fabric would handle digital signal-processing-intensive operations, with a CPU performing complex control and prediction algorithms. The exploratory implementation described here connected the CPU board to the FPGA board via an Ethernet interface. The FPGA performed high-bandwidth processing, with only metadata interchanged through the network tether. This project also explored the simultaneous development of hardware and software, which required the use of accurate simulation models well ahead of the final FPGA hardware implementation.
PHASE 1: ROAD SIGN DETECTION
This portion of the project, along with the next phase, leveraged two specific PC-based functions: a simulation model of under-development Xilinx video IP blocks, and a BDTI-developed processing application (Figure 1). The input data consisted of a 720p HD resolution, 60-frame/second (fps) YUV-encoded video stream representing the images that a vehicle’s front-facing camera might capture. And the goal was to identify (albeit not “read” using optical character recognition, although such an added capability would be a natural extension) four types of objects in the video frames as a driver-assistance scheme:
• Green directional signs
• Yellow and orange hazard signs
• Blue informational signs, and
• Orange traffic barrels
Figure 1: The first two phases of BDTI and Xilinx’s video-analytics proof-of-concept
reference design development project ran completely on a PC.
The Xilinx-provided IP block simulation models output metadata that identified the locations and sizes of various-colored groups of pixels in each frame, the very same metadata generated by the final hardware IP blocks. The accuracy of many embedded vision systems is affected by external factors such as noise from imaging sensors, unexpected changes in illumination and unpredictable external motion. The mandate for this project was to allow the FPGA hardware to process the images and create metadata in the presence of external disturbances with parsimonious use of hardware resources, augmented by predictive software that would allow for such disturbances without decreasing detection accuracy.
BDTI optimized the IP blocks’ extensive set of configuration parameters for the particular application in question, and BDTI’s post-processing algorithms provided further refinement and prediction capabilities. In some cases, for example, the hardware was only partially able to identify the objects in one frame, but the application-layer software continued to predict the location of the object using tracking algorithms. This approach worked very well, since in many cases the physical detection may not be consistent across time. Therefore, the software intelligent layer is the key to providing consistent prediction.
As another example, black or white letters contained within a green highway sign might confuse the IP blocks’ generic image-analysis functions, thereby incorrectly subdividing the sign into multiple-pixel subgroups (Figure 2
). The IP blocks might also incorrectly interpret other vehicles’ rear driving or brake lights as cones or signs by confusing red with orange, depending on the quality and setup of the imaging sensor used for the application.
Figure 2: Second-level, application-tailored algorithms refined the metadata coming from the FPGA’s video-analysis hardware circuits.
The BDTI-developed algorithms therefore served to further process the Xilinx-supplied metadata in an application-tailored manner. They knew, for example, what signs were supposed to look like (size, shape, color, pattern, location within the frame and so on), and therefore were able to combine relevant pixel clusters into larger groups. Similarly, the algorithms determined when it was appropriate to discard seemingly close-in-color pixel clusters that weren’t signs, such as the aforementioned vehicle brake lights.