More generally, diverse markets exist where gestures are useful for display control. You might recall, for example, the popular image of Tom Cruise manipulating the large transparent display in the movie Mission Impossible. Or consider the advertising market where interactive digital signs could respond to viewers’ gestures (not to mention identifying a particular viewer's age, gender, ethnicity and other factors) in order to optimize the displayed image and better engage the viewer. Even in industrial markets, appliances such as ceiling-positioned HVAC sensors could be conveniently controlled via gestures. As sensor technologies, gesture algorithms and vision processors continue to improve over time, what might appear today to be a unique form of interactivity will be commonplace in the future, across a range of applications and markets.
Implementations vary by application
The meaning of the term "gesture recognition" has become broader over time, as it's used to describe an increasing range of implementation variants. These specific solutions may be designed and optimized, for example, for either close- or long-range interaction, for fine-resolution gestures or robust full-bodied movements, and for continuous tracking or brief-duration gestures.
Gesture recognition technology entails a wide variety of touch-free interaction capabilities, each serving a different type of user interface scenario.
Close-range gesture detection is typically used in handheld devices such as smartphones and tablets, where the interaction occurs in close proximity to the device’s camera. In contrast, long-range gesture control is commonly employed with devices such as TVs, set-top boxes, digital signage, and the like, where the distance between the user and the device can span multiple feet and interaction is therefore from afar.
While user interface convenience is at the essence of gesture control in both user scenarios, the algorithms used, specifically the methods by which gestures are performed and detected, are fundamentally different. In close-range usage, the camera "sees" a hand gesture in a completely different way than how the camera "sees" that same hand and gesture in long-range interaction.
Additionally, a distinction exists between different gesture "languages." For example, when using gestures to navigate through the detailed menus of a "smart" TV, the user will find it intuitive to use fine-resolution, small gestures to select menu items. However, when using the device to play games based on full-body detection, robust gestures are required to deliver the appropriate experience.
Moreover, differences exist between rapid-completion gestures and those that involve continuous hand tracking. A distinctive hand motion from right to left or left to right can be used, for example, to flip eBook pages or change songs on a music playback application. These scenarios contrast to continuous hand tracking, which is relevant for control of menus and other detailed user interface elements, such as a Windows 8 UI or a smart TV's screen.
Other implementation challenges
Any gesture control product contains several different key hardware and software components, all of which must be tightly integrated in order to provide a compelling user experience. First is the camera, which captures the raw data that represent the user’s actions. Generally, this raw data is then processed, in order to reduce the noise in the signal, for example, or (in the case of 3-D cameras) to compute the depth map.
Specialized algorithms subsequently interpret the processed data, translating the user’s movements into "actionable" commands that a computer can understand. And finally, an application integrates these actionable commands with user feedback in a way that must be both natural and engaging. Adding to the overall complexity of the solution, the algorithms and applications are increasingly implemented on embedded systems with limited processing, storage and other resources.
Tightly integrating these components to deliver a compelling gesture control experience is not a simple task, and the complexity is further magnified by the demands of gesture control applications. In particular, gesture control systems must be highly interactive, able to process large amounts of data with imperceptible latency. Commonly encountered incoming video streams, depending on the application, have frame resolutions ranging from QVGA to 1080p HD, at frame rates of 24 to 60 fps.
Bringing gesture control products to market therefore requires a unified effort among the different members of the technology supplier ecosystem: sensor and camera manufacturers, processor companies, algorithm providers, and application developers. Optimizing the different components to work together smoothly is critical in order to provide an engaging user experience. Vision functions, at the core of gesture algorithms, are often complex to implement and may require substantial additional work to optimize for the specific features of particular image processors. However, a substantial-sized set of functions finds common and repeated use across various applications and products. A strong case can therefore be made for the development of cross-platform libraries that provide common low-level vision functions.
In a market as young as gesture control, there is also still little to no standardization across the ecosystem. Multiple camera technologies are used to generate 3-D data, and each technique produces its own characteristic artifacts. Each 3-D camera also comes with its own proprietary interface. And gesture dictionaries are not standardized; a motion that may one thing on one system implementation may mean something completely different (or alternatively nothing at all) on a different system. Standardization is inevitable and is necessary for the industry to grow and otherwise mature.