Stereoscopic 3-D is quickly emerging as a prime technology across various markets, adding a further dimension of reality to existing 2-D videos, games, movies and images. With 3-D TVs having hit store shelves, consumers now are getting acquainted with large-screen, realistic S3-D effects in home entertainment. Today, S3-D experiences are migrating from the large screen to mobile devices, providing realistic—and glasses-free—personalized viewing experiences on the go.
Overall, S3-D video and imaging use cases can be categorized in two ways: S3-D content creation and S3-D viewing. Each poses a unique set of challenges in mobile design and development. This article offers solutions to some of the challenges and shares perspectives on how to enable successful S3-D experiences on mobile platforms.
It’s important first to understand how S3-D experiences are created.
S3-D essentially adds an extra dimension to a viewing scene using left- and right-image pairs via two cameras. In games, for example, S3-D rendering refers to the positioning of virtual cameras, while for S3-D video and images, content is created using two sensors that are physically spaced apart.
The human brain is able to differentiate depth perception when both views (left and right, seen through the eyes) are rendered together. Farther objects in a given scene are seen at a distance, while closer objects are seen as closer in proximity to the viewer.
With the correct level of depth adjustments, pairs of stereo images provide the most realistic and natural user experience. Farther objects are given positive disparity, and nearer objects are given negative disparity. Accurately providing such disparity requires a reference object on which to focus; this is called a convergence plane.
In addition, human eyes see a field of view (FOV) that is dynamically variable based on where the eyes are looking, yielding a very flexible S3-D viewing experience at will.
Click on image to enlarge.
In order to produce such an S3-D effect, content creation needs to be done with two different camera sensors, and the left- and right-image pair needs to be processed at 60 frames/second (left and right at 30 frames/s independently).
Stereo camera pairs can be positioned in one of two ways when creating S3-D images—either in a towing angle or in a flat angle—to achieve the correct FOV. Based on the sensor characteristics, resolution and focal length, a designer will be able to decide on the best recording distance between the stereo pair. Positioning of the stereo pair is extremely crucial for getting the right convergence plane. The stereo pair can be positioned at a distance of 65 mm (like human eyes) to yield a large recording distance. In designing a smartphone or other device with similar size attributes, the designer can consider keeping the positioning at a distance of 35 mm, to achieve a personalized recording distance (1-meter to 3-meter range).
Such camera pairs, when placed on the gadgets, do not necessarily align mechanically perfectly in translational and rotational directions.
Click on image to enlarge.
Click on image to enlarge.
There can be minor misalignment in the millimeters while placing the sensor modules on the form factor device. Such minor variations in physical placements in translational and rotational directions can create large misalignment variations in the image plane. This imposes a huge challenge in terms of calibrating the misalignments up front and correcting the misalignment on a per-frame basis while the content is created. Furthermore, a device’s mechanical aspects—even temperature variations and the occasional falling of the gadget—can create such misalignment between the stereo pair of sensors. It therefore becomes vital to correct such variations in real-time.
Once content is created, it is important to ensure it is viewable on the target devices. System software running in the gadget should be capable of doing the following to provide successful S3-D content viewing experiences:
• Combine the stereo image pair and process using the image signal processing (ISP) unit for the correct resolution, distortion corrections, image quality tuning and more.
• Decide the convergence plane at run-time using efficient algorithms, and create disparity vectors for the stereo pair at run-time to provide pleasing viewer experiences.
• Correct for the misalignments in translational and rotational directions at run-time between the stereo image pair, and apply the corrections offsets per frame.
• Synchronize the 3A (auto-exposure, auto-white balance, autofocus) between the sensor modules, and fine-tune the image tuning parameters.
These operations require very sophisticated hardware accelerators that can run and process the stereo pair of high-resolution images. Such accelerators are fundamental to next-generation application processors.
Through convergence and misalignment corrections, processed image pairs are passed to the video accelerators of the application processors to encode data in 3-D formats. Today’s H.264 codec offers an extension to process the S3-D information using supplementary enhancement information (SEI), which describes the format and layout of the encoded S3-D scene.
Emerging standards such as Multi-View Codec (MVC) let designers encode more than two views for true S3-D effects using multiple views. MVC codecs correlate the left- and right-view pair for spatial predictions and motion estimations for effective bit rate savings while encoding. Utilizing the information between the left and right pair for effective bandwidth reduction can improve the system data usage during an S3-D video conference, for example, since users in such instances are limited by network bandwidth.
Video encoders and decoders have S3-D awareness based on the content layout. The left and right images can be formatted in multiple ways (side by side, top/bottom, interleaved [column/row] and more). Based on the formatted layout gathered, information is decoded and provided back to the display’s subsystem for rendering the data in stereoscopic fashion.
Generating S3-D experiences
Stereoscopic viewing experiences can be generated in multiple forms. Two of the most popular ways to view S3-D are through LCD shutter glasses and on autostereoscopic LCD panels. Shutter glasses achieve S3-D experiences by rendering 50 percent of the rendered pictures for the left eye and the other 50 percent for the right eye. A technique called time-sequential multiplexing then alternately displays the left- and right-eye images every time the computer refreshes (draws) the screen.
Turning the shutters on the left and right lenses of the glasses using the sync signals generated from the TV creates an S3-D effect for users. It is important to realize that synchronizations need to happen very fast (faster than can be perceived) to ensure that a user thinks he or she is seeing true S3-D. That requires immense processing power on the part of the display subsystem of application processors, especially when dealing with high-definition video.
For glasses-free 3-D, autostereoscopic LCD panels display multiple views on the LCD panel. Examples of autostereoscopic displays include parallax barrier, lenticular and time-sequential LCD panels.
The parallax barrier, placed in front of the LCD, consists of a layer of material with a series of precision slits, allowing each eye to see a different set of pixels and thereby creating a sense of depth through parallax. The viewing angle of a parallax barrier LCD is limited, and the resolution of the pixel count is reduced by half in the horizontal direction; half the pixel count is seen by the left eye and half by the right eye.
Lenticular displays use two-dimensional arrays of lenslets designed so that when the arrays are viewed from slightly different angles, an S3-D effect is created. Time-sequential LCD panels use an S3-D film (creating an angular view of light flow through the film) in front of the LCD, controlling the backlights placed on either side of the LCD at a 120-Hz refresh rate to create a 3-D viewing experience for the users. Unlike parallax barrier LCD panels, 3-D film-based time-sequential panels produce a full-resolution S3-D experience.
Autostereoscopic panels are becoming popular in mobile devices. The panels need extensive display processing capabilities at the pixel level to format and create an S3-D viewing experiences in real-time. The display processing has to be effective at column/row/pixel interleaving for HD-resolution stereo pairs at 60 frames/s.
S3-D viewing quality poses many challenges, and it varies with respect to the size of the LCD screen and the angle at which the user is viewing the content. It is important for the created S3-D content to address convergence issues and misalignment corrections, and to enable the appropriate level of disparity in the video. If this is not done effectively, the viewing experiences can irritate human eyes.
Research continues with respect to disparity corrections, depth grading and scene ramping (changing disparity based on the scene pattern changes) to provide positive viewing experiences.
The computational power needed to run such content-creation algorithms and pixel-level display processing subsystems requires that application processors emerge to meet the needs of S3-D HD systems. Devices with immense processing power inside can provide pleasing and natural viewing experiences to users, adding the dimension for which S3-D will be known.
Keep an eye out this year for S3-D-enabled mobile devices.
Veera Manikandan Raju is engineering manager for Texas Instruments’ Natural User Interface group, which is part TI’s Wireless business unit. He studied at the Regional Engineering School of Trichirappalli, India.