PHASE 2: PEDESTRIAN DETECTION AND TRACKING
In the first phase of this project, the camera was in motion but the objects (that is, signs) being recognized were stationary. In the second phase targeting security, on the other hand, the camera was stationary but objects (people, in this case) were not. Also, this time the video-analytics algorithms were unable to rely on predetermined colors, patterns or other object characteristics; people can wear a diversity of clothing, for example, and come in various shapes, skin tones and hair colors and styles (not to mention might wear head-obscuring hats, sunglasses and the like). And the software was additionally challenged with not only identifying and tracking people but also generating an alert when an individual traversed a digital “trip wire” and was consequently located in a particular region within the video frame (Figure 3
Figure 3 – Pedestrian detection and tracking capabilities included a “trip wire” alarm that reported when an individual moved within a bordered portion of the video frame.
The phase 2 hardware configuration was identical to that of the earlier phase 1, although the software varied; a video stream fed simulation models of the video-analytics IP cores, with the generated metadata passing to a secondary algorithm suite for additional processing. Challenges this time around included:
• Resolving the fundamental trade-off between unwanted noise and proper object segmentation
• Varying object morphology (form and structure)
• Varying object motion, both person-to-person and over time with a particular person
• Vanishing metadata, when a person stops moving, for example, is blocked by an intermediary object or blends into the background pattern
• Other objects in the scene, both stationary and in motion
• Varying distance between each person and the camera, and
• Individuals vs. groups, and dominant vs. contrasting motion vectors within a group
With respect to the “trip wire” implementation, four distinct video streams were particularly effective in debugging and optimizing the video-analytics algorithms:
• “Near” pedestrians walking and reversing directions
• “Near” pedestrians walking in two different directions
• A “far” pedestrian with a moving truck that appeared, through a trick of perspective, to be of a comparable size, and
• “Far” pedestrians with an approaching truck that appeared larger than they were
PHASE 3: HARDWARE CONVERSIONS AND FUTURE EVOLUTIONS
The final portion of the project employed Xilinx’s actual video-analytics IP blocks (in place of the earlier simulation models), running on the Spartan®-3A 3400 Video Starter Kit. A MicroBlaze™ soft processor core embedded within the Spartan-3A FPGA, augmented by additional dedicated-function blocks, implemented the network protocol stack. That stack handled the high-bit-rate and Ethernet-packetized metadata transfer to the BDTI-developed secondary processing algorithms, now comprehending both road sign detection and pedestrian detection and tracking. And whereas these algorithms previously executed on an x86-based PC, BDTI successfully ported them to an ARM® Cortex™-A8-derived hardware platform called the BeagleBoard (Figure 4
Figure 4 – The final phase of the project migrated from Xilinx’s simulation models to actual FPGA IP blocks. BDTI also ported the second-level algorithms from an x86 CPU to an ARM-based SoC, thereby paving the path for the single-chip Zynq Extensible Processing Platform successor.
Those of you already familiar with Xilinx’s product plans might right now be thinking of the Zynq™ Extensible Processing Platform, which combines the FPGA and Cortex-A8 CPU on a single piece of silicon. Might it be possible to run the entire video-analytics reference design on a single Zynq device? The likely answer is yes, since the Zynq product family includes devices containing sufficient programmable logic resources, and since the BDTI algorithms put only a moderate load on the ARM CPU core.
Embedded vision is poised to become the next notable technology success story for both systems developers and their semiconductor and software suppliers. As the case study described in this article suggests, FPGAs and FPGA-plus-CPU SoCs can be compelling silicon platforms for implementing embedded vision processing algorithms.