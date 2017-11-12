Time to Rethink Computer Vision
12/11/2017
Today, computers are fast becoming the world's largest consumers of images, and yet, this is not reflected in the way that images are captured.
It’s time to rethink how machines look at the world, using inspirations drawn from human vision to reshape computer vision and enable a new generation of vision-enhanced products and services.
What’s the issue with the way that computer vision works now, and what do we do about it? Simply put, digital cameras have worked the same way for decades – all the pixels in an array measure the light they receive at the same time, and then report their measurements to the supporting hardware. Do this once and you have a stills camera. Repeat it rapidly enough and you have a video camera – an approach that hasn’t changed much since Eadweard Muybridge accidentally created cinema while exploring animal motion in the 1880s.
This approach made sense when cameras were mainly used to take pictures of people for people. Today, computers are fast becoming the world’s largest consumers of images, and yet this is not reflected in the way that images are captured. Essentially, we’re still building selfie-cams for supercomputers.
Eye on the prize
The route to machine-friendly imaging has been mapped out for us in a discipline known as neuromorphic engineering, which uses clues derived from the architecture and processing strategies of our brains to build a better, biologically inspired approach to computer vision.
The human vision system gives us a huge evolutionary advantage, at the cost of sustaining a brain powerful enough to interpret the vast amount of data available in the visual scene. Evolution’s frugal nature led to the emergence of shortcuts to cope with this data deluge. For example, the photoreceptors in our eyes only report back to the brain when they detect change in some feature of the visual scene, such as its contrast or luminance. Evolutionarily, it is far more important to be able to concentrate on movement within a scene than to take repeated, indiscriminate inventories of its every detail.
If the strategy is good enough for humans, it should be good enough for a new generation of bio-inspired vision sensors and the related artificial intelligence (AI) algorithms that underpin computer vision. It’s now time to bring our ‘event-driven’ sensor and its supporting ecosystem to market.
Events not images
What does our sensor do differently? The most obvious difference is that its array of pixels doesn’t have a common frame rate. In fact, there are no frames at all. Instead, each pixel only outputs the intensity data it has measured once the light falling upon it has changed by a set amount. If the incident light isn’t changing (for example, in the background of a security camera’s field of view) then the pixel stays silent. If the scene is changing (for example, a car drives through it), the affected pixels report the change. If many cars pass, all the affected pixels report a sequence of changes.
This approach has intriguing advantages. Motion blur becomes a thing of the past, because the faster the image changes the faster each affected pixel reports that change. Conversely, static parts of the image don’t keep diligently reporting their unchanging status, reducing the amount of redundant data being processed. Under- or over-exposure issues are avoided since each pixel adjusts its exposure time according to the incident lighting conditions. Images can be filtered by their contrast level by adjusting how much the intensity of each pixel has to change before the pixel fires off a report of that change.
Probably the most important aspect of our event-driven sensor, though, is the way it changes how we think about computer vision. If looking at a conventional video is like being handed a sequence of postcards by a friend and being asked to work out what is changing by flicking through them, an event-driven sensor’s output is more like looking at a single postcard while that friend uses a highlighter to mark every change in the scene as it happens – no matter the lighting conditions in the scene.
In effect, the data stream that a computer vision system needs to analyse changes from a sequence of full-frame images, delivered to the beat of a fixed sampling clock, into an unsynchronised sequence of signals fired off by each pixel that has been subject to the set amount of change. A second signal produces pulses that represent the intensity of the light being measured by each pixel at that time.
It is not a coincidence that these ‘spiking’ pulse streams resemble the signals that the human brain and visual cortex uses to process temporal events – this is, after all, neuromorphic, or ‘brain-shaped’ engineering. In fact, Chronocam is building its approach to computer vision on a new mathematical framework that enables more effective analysis of such spiking signals by AI algorithms.
Making predictions
If you imagine laying one of our sensors flat in front of you and then think of time as the vertical axis above it, a video of a circling dot captured by our sensor would create a slowly rising helix in the resultant three-dimensional space, as shown.
A dot moving in a straight line would become a rising stroke. Two moving dots would create two strokes, whose paths could be used to assess whether the dots would intersect. If the dots were actually vehicles passing through a security camera’s field of view, as shown here:
then our event-driven sensor would represent their movement like this:
For many applications, this simple yet information-rich representation of a changing scene should be easier for the AI algorithms used in computer vision to interpret than conventional video. For example, detecting and tracking objects, estimating their position and localization in real time, learning their characteristics and classifying them, are all simplified by this approach.
We believe this simplicity will bring advanced vision features to a much wider range of markets than currently expected. For example, today’s approach to autonomous vehicles involves driving test cars millions of miles to gather detailed scene data from multiple cameras and laser rangefinders. This ocean of data is then used to train artificial neural networks about how a vehicle should be driven. Surely an event-based sensor that only reports changes in a scene can simplify this process and so reduce the cost of achieving the very low latencies necessary for safe driving?
The efficiency of event-driven sensing and vision analysis should also make it possible for our customers to build lower-cost eye trackers for augmented reality headsets, again because we’re only sensing and processing how the eye changes.
Event-based sensing may also help ease regulatory barriers to humans and robots working together. The safe working distance between a robot and a human is defined by the speed of the robot’s movement and how long it takes to process its vision signals. Event-based sensing could speed up vision processing and so enable robots and humans to work in closer proximity.
Our technology should also benefit the security industry, by helping to turn all those unwatched CCTV camera feeds into high-quality, actionable surveillance information – by focusing attention on how the scene changes.
In the long term, event-based sensing may even help us to restore vision to people who have lost their sight. This was a revelation for me, but the techniques that underlie Chronocam’s offering are already in clinical trials for use within retinal implants.
Machine vision for the masses
Chronocam’s goal is to make event-based sensing the basis for accessible, mass-market computer vision.
We believe that by using neuromorphic engineering techniques we are enabling machines not just to see, but to truly sense the environments within which they exist and interact. The power, performance and predictability challenges must be overcome to achieve the necessary reliable and practical intelligent vision-enabled systems that can turn seeing into sensing. We believe that event-based computer vision can help meet those challenges and so unlock significant rewards for society.
New types of machine vision can revolutionize how we move through, interact within and interpret the world. In the long term, this technology may even help us overcome natural limitations or deficiencies in our own ability to see – using neuromorphic engineering to enhance, rather than just mimic, human capabilities.
-- The author is Luca Verre, CEO at Chronocam. He co-founded Chronocam in 2014, with an international team of experts in neuromorphic vision, inspired by the technology’s early application in research into restoring human sight.