datasheets.com EBN.com EDN.com EETimes.com Embedded.com PlanetAnalog.com TechOnline.com  
Events
UBM Tech
UBM Tech

Design Article

Embedded vision: FPGAs’ next technology opportunity

Brian Dipert, Embedded Vision Alliance, José Alvarez, Xilinx, and Mihran Touriguian, Berkeley Design Technology, Inc.

7/2/2012 10:56 AM EDT

PHASE 2: PEDESTRIAN DETECTION AND TRACKING
In the first phase of this project, the camera was in motion but the objects (that is, signs) being recognized were stationary. In the second phase targeting security, on the other hand, the camera was stationary but objects (people, in this case) were not. Also, this time the video-analytics algorithms were unable to rely on predetermined colors, patterns or other object characteristics; people can wear a diversity of clothing, for example, and come in various shapes, skin tones and hair colors and styles (not to mention might wear head-obscuring hats, sunglasses and the like). And the software was additionally challenged with not only identifying and tracking people but also generating an alert when an individual traversed a digital “trip wire” and was consequently located in a particular region within the video frame (Figure 3).


Figure 3 – Pedestrian detection and tracking capabilities included a “trip wire” alarm that reported when an individual moved within a bordered portion of the video frame.

The phase 2 hardware configuration was identical to that of the earlier phase 1, although the software varied; a video stream fed simulation models of the video-analytics IP cores, with the generated metadata passing to a secondary algorithm suite for additional processing. Challenges this time around included:

•  Resolving the fundamental trade-off between unwanted noise and proper object segmentation
•  Varying object morphology (form and structure)
•  Varying object motion, both person-to-person and over time with a particular person
•  Vanishing metadata, when a person stops moving, for example, is blocked by an intermediary object or blends into the background pattern
•  Other objects in the scene, both stationary and in motion
•  Varying distance between each person and the camera, and
•  Individuals vs. groups, and dominant vs. contrasting motion vectors within a group
With respect to the “trip wire” implementation, four distinct video streams were particularly effective in debugging and optimizing the video-analytics algorithms:
•  “Near” pedestrians walking and reversing directions
•  “Near” pedestrians walking in two different directions
•  A “far” pedestrian with a moving truck that appeared, through a trick of perspective, to be of a comparable size, and
•  “Far” pedestrians with an approaching truck that appeared larger than they were

PHASE 3: HARDWARE CONVERSIONS AND FUTURE EVOLUTIONS
The final portion of the project employed Xilinx’s actual video-analytics IP blocks (in place of the earlier simulation models), running on the Spartan®-3A 3400 Video Starter Kit. A MicroBlaze™ soft processor core embedded within the Spartan-3A FPGA, augmented by additional dedicated-function blocks, implemented the network protocol stack. That stack handled the high-bit-rate and Ethernet-packetized metadata transfer to the BDTI-developed secondary processing algorithms, now comprehending both road sign detection and pedestrian detection and tracking. And whereas these algorithms previously executed on an x86-based PC, BDTI successfully ported them to an ARM® Cortex™-A8-derived hardware platform called the BeagleBoard (Figure 4).


Figure 4 – The final phase of the project migrated from Xilinx’s simulation models to actual FPGA IP blocks. BDTI also ported the second-level algorithms from an x86 CPU to an ARM-based SoC, thereby paving the path for the single-chip Zynq Extensible Processing Platform successor.

Those of you already familiar with Xilinx’s product plans might right now be thinking of the Zynq™ Extensible Processing Platform, which combines the FPGA and Cortex-A8 CPU on a single piece of silicon. Might it be possible to run the entire video-analytics reference design on a single Zynq device? The likely answer is yes, since the Zynq product family includes devices containing sufficient programmable logic resources, and since the BDTI algorithms put only a moderate load on the ARM CPU core.

Embedded vision is poised to become the next notable technology success story for both systems developers and their semiconductor and software suppliers. As the case study described in this article suggests, FPGAs and FPGA-plus-CPU SoCs can be compelling silicon platforms for implementing embedded vision processing algorithms.




mkr

7/5/2012 7:36 AM EDT

There surely is a lot of applications where embedded vision can really shine. I come from the academia, where I'm working mainly on vision for mobile robotic and surveillance applications. In the field of robotics, the dominant trend is to pack the machine with a PC and let it handle all the algorithmic heavy lifting. There are however emerging applications where using a PC as we know it is a dealbreaker - think UAVs. As for surveillance - at present the dominant paradigm is centralized processing, using some server or even a server cluster. The image data from cameras has to be transfered for processing, putting a large pressure on the communication infrastructure. Sometimes the constraints presented by the communication infrastructure are a brick wall - a complete system redesign is necessary to top over it (or go around it). A natureal solution to this problem is in-place processing.
Programmable logic really shines when it comes to processing of local image information, e.g. using the sliding window approach. Our stream processors for image filtering and feature detection and matching can crunch hundreds of VGA frames per second. Combine it with a nice, low power embedded processor and you get a system for (almost) every job. And with Zynq, you get it all in one package. The only problem is that the development is significantly more complicated than it is the case with pure software designs.

Sign in to Reply



Dr DSP

7/10/2012 5:44 PM EDT

The topic of multi-camera analytics perhaps deserves some additional discussion. It should be possible to combine multiple views from multiple cameras to more precisely determine acceleration, relative position, and object characteristics (a 'person' in a sign vs. a real 3D person). Any features that support these requirements?

Sign in to Reply



anne-francoise.pele

7/16/2012 11:25 AM EDT

Do not hesitate to tell us about your real-world experiences, your on-going projects your achievements, etc. in the field of embedded vision.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)