|
ASIC Design
In an effort to keep pace with rapidly evolving digital cameras, Sierra Imaging set out to create an ASIC that combines the performance needed for real-time image processing with the flexibility and programmability needed to support a wide variety of digital imaging products. To achieve the first aim, we designed a fast multiplier-accumulator (MAC) and a DRAM controller that uses double-buffered DMA to deliver a peak transfer rate of 120 Mbytes/second. For the second aim, we turned to three RAM-based state machines that can handle a virtually endless variety of algorithms. Besides the hardware requirements, the design process presented its own demands. First, the design flow had to be highly automated, as we had a relatively small team. Second, since the specification contained so much flexibility, we needed a verification methodolgy that would allow us to exercise the ASIC as it would be used in a system configuration. The most powerful aspect of the methodolgy was a system-level virtual prototype that employed HDL simulation models. We also developed a set of regression tests to automatically verify the functionality of each major block of the design as it was refined. As a result, our first-pass silicon was indeed fully functional.
Figure 1. Acquiring pixels from a CCD sensor is a complex analog process whose success has a major influence on a digital camera's image quality. The image on the left was acquired by following the sensor manufacturer's basic guidelines. The one on the right came from the same sensor but with an optimized acquisition sequence.A major goal for digital cameras is to bring the price down into a range comparable to that of good film cameras while improving image quality and increasing versatility. On the surface, an obvious solution to the versatility challenge might have been a fast CPU teamed up with a general-purpose DSP; however, we were unaware of any DSP solution that was optimized for embedded, real-time image processing. Additionally, digital cameras require specialized hardware such as video drivers and a timing generator to control the CCD or CMOS sensor and video encoders. Breaking it down To appreciate why achieving low cost and versatility is so challenging, it helps to understand how digital cameras work. All digital cameras contain the following components: a lens, a CCD or CMOS sensor, A/D converters, a strobe, a CPU, program memory, image memory, and a small ASIC or two. The way you use the various components in a digital camera makes all the difference. To differentiate your camera in terms of image quality and price, you need to get optimal performance from inexpensive image capture and display subsystems. Even such a seemingly basic operation as acquiring pixels from a CCD or CMOS sensor is more difficult than it looks. A sensor is a complex analog component, and the way you coax the data from it determines the initial image quality (see Figure 1).
Figure 2. The Raptor combines special-purpose digital camera capabilities with general-purpose functionality that can serve a wide variety of imaging applications. Examples of the latter include three RAM-based state machines in the image transform processor that can compute an unlimited range of imaging algorithms at an extremely high speed.Once you capture the pixels from the sensor, you can apply a variety of corrections to the data, depending on the characteristics of the lens, the sensor, the scene being photographed, the type of compression used, the display device, and the desired file size. The camera needs to capture information about the image, such as color, temperature, exposure ranges, and strobe light duration. Based on this information, the processing algorithms can enhance the image quality by adding subtle shading, highlights, color saturation, and sharpness.
Additional considerations for a digital camera include power management, small size, ruggedness, I/O ports for offloading images, and an easy-to-use interface. Some digital cameras accommodate removable flash memory cards for storing images, and some incorporate a small LCD that serves as both a viewfinder and a way to display captured
Setting the specs After designing the firmware for a digital camera, we saw an opportunity to develop a commercial solution that would go far beyond the needs of any one camera manufacturer. Building on the experience of the first design, we had some ideas for the new solution. Coming up with a set of exact specifications, however, remained a challenge. In our first design, about 100 million operations per picture were needed to enhance the image quality, compress the image, and perform overhead tasks. These operations took eight seconds on a 20-MHz SPARClite processor for a 640 X 480-pixel image (VGA resolution). After taking one picture, the photographer had to wait eight seconds before the camera could take another picture. Compared with the typical motor-drive SLR, which can take photos almost as fast as the user can push the shutter release, the digital camera was far too slow.
Figure 3. A fast MAC serves as the heart of the Raptor's image-processing capabilities. By organizing four single-cycle multipliers in parallel and feeding them at maximum speed, the Raptor can perform 240 million 12 X 16-bit multiply-accumulate operations per second.We knew that the 100 million operations performed for each picture would have to expand to roughly 1 billion operations per picture. To achieve such a great increase in processing speed, the new design had to cycle through the operations much faster. The goal was to reduce the photo cycle time from eight seconds to two seconds. Even with an upgrade to a 60-MHz SPARClite chip, the processor wouldn't come close to completing the operations in eight seconds, much less two. How to close this processing gap without driving up the cost was unclear, as were many other aspects of the design. For example, beyond the approximate number of operations per picture, we couldn't say exactly which of the hundreds of possible image-processing algorithms might be necessary for quality images for a wide range of camera types. Other questions abounded: What image resolution should the camera use? What type of sensor? What type of display? What type of removable storage medium? What types of I/O ports? What type of image compression? Additionally, we recognized that a good image-processing solution could form the foundation for a variety of digital imaging applications. Along with general-purpose digital cameras, the solution could support integrated imaging and reporting devices for insurance or law enforcement or imaging and measurement devices for surveying. Other examples include printers, scanners, home and industrial security systems, and video conferencing and telephony. The ability to handle those applications hinged on developing a solution that was exceptionally fast and flexible but inexpensive enough for mass-market digital cameras. The dedicated programmable solution To accommodate the wide range of sensors, displays, storage media, I/O ports, compression formats, and image-processing algorithms that might prove desirable in future cameras, we decided to create an ASIC with the broadest capabilities that the design team could include while staying within a budget for mass-market cameras. The ASIC had to combine the desired flexibility and programmability with the performance needed for real-time image processing. For our design, we used a 0.35-µm, 150,000-gate array with embedded RAM and a phase-locked loop and four multipliers from the ASIC vendor's design library. This 60-MHz ASIC, named Raptor, plus a 60-MHz SPARC-lite processor (to run our camera operating system) and a microcontroller (for power management and I/O tasks) would keep the total cost of the processing components well under $50. In addition to image processing, the ASIC would provide:
To handle those and many other tasks, the Raptor design dedicates 100,000 gates to image processing, most of which are used for a four-way-parallel MAC that performs 240 million 12 X 16-bit multiply-accumulate operations a second, the three RAM-based state machines, a DRAM controller with a peak transfer rate of 120 Mbytes/second (faster than the one provided by the RISC processor), and several types of controllers (see Figure 2). The Raptor meets the requirement to perform approximately 1 billion operations in two seconds, and the microsequenced state machines deliver the necessary flexibility and programmability. The fast MAC in the Raptor's image transform processor is a major contributor to the performance because most image-processing tasks demand a lot of multiply-accumulate operations. It allows the chip to convert from an RGB-sensor color space to a 4:2:2 YUV color space in three clock cycles per pixel, all in a single pass from DRAM. How to make a fast MAC The MAC uses four efficient single-cycle multipliers in parallel. A 72-bit value from a control store specifies the element to be multiplied from the contents of the barrel shifter at the top of the unit (see Figure 3). The control store also provides coefficients for multiplying. These values feed into the parallel multipliers, whose outputs continue through the three adders, the accumulator (actually comprising seven accumulators), and the descaler. The final result is a 32-bit value. The MAC can perform four multiplies and descales per clock cycle. This design is well suited for common algorithms used in image-processing operations such as 2-D filtering (on 434-pixel blocks) and discrete cosine transforms (on 838-pixel blocks). The parallel MAC can provide one 2-D DCT coefficient per four clocks. (The DCT converts images from the spatial domain into the frequency domain, which is useful for identifying an image's high-frequency components. Since the human eye doesn't clearly perceive high-frequency information, compression algorithms can delete some of this information with little effect on image quality. This approach works only if the DCT processing is performed before compression, that is, in the camera.) Optimized multiply-accumulate operations and data manipulation for image processing are extremely useful. A fast MAC structure should be combined with proper ordering of input data and output results to achieve high performance in most of the image-processing operations needed for digital cameras. The Raptor ASIC meets those requirements by using datapath design approaches for intelligent DMA and internal data ordering. The ability to minimize data movement to and from DRAM is crucial for power conservation as well. The simulation executed the camera's actual C program to take photographs, process them, and display them on an NTSC monitor.The Raptor's pixel processor is fed and emptied using double-buffered DMA. Designed to work closely with the MAC so that the latter receives data at the highest speed possible, the DMA allows most pixel operations to run at approximately 40 million to 58 million pixels per second, providing automatic interleaving/de-interleaving of color channels and other similar data sorting during processing so that the RISC processor doesn't waste time and bandwidth reformatting the data. A design flow to match the design requirements Because we had a relatively small team, the design flow had to be highly automated (see Figure 4). In addition, since the specification for the ASIC contained so much flexibility, a method for verifying the design was needed to exercise the ASIC as it would be used in a system configuration. The most powerful aspect of the design flow was a system-level virtual prototype that employed simulation models in Verilog and VHDL. The mixed Verilog-VHDL environment was necessary because both languages were represented in the available component models. (Significantly, a cycle-accurate Verilog model of the SPARClite processor was available.) We used FPGAs to emulate interfaces to devices such as the sensor. With all of the image-processing hardware modeled in software, the simulation executed the camera's actual C program to take photographs, process them, and display them on an NTSC monitor. The entire process took about 48 hours, but it gave us visual feedback for verifying the Raptor architecture. We also developed a set of regression tests to check each major block of the design as it was refined. The tests were set up to verify functionality automatically so that we didn't have to interpret waveforms to confirm correct operation. Every night, half of the regression tests would run, and the software would send an e-mail to each designer indicating whether the design had passed or failed each test. Both C and shell programming were important aspects of our automated verification. Without an expert in those disciplines, the regression testing would have been much more difficult and time-consuming. With this verification methodology, we were able to address new requirements in the ASIC two weeks before tape-out and still verify fully correct operation of all functions. First-round silicon proved to be fully functional. Another key benefit of our verification methodology was that C could run without modification on actual hardware in the simulation environment. This capability allowed almost all low-level firmware functions to be developed and tested before the ASIC was available. Additionally, the tests used to verify the ASIC in simulation were used to verify the ASIC in silicon on camera boards. * Bob Taylor is vice president of engineering at Sierra Imaging, Inc. (Scotts Valley, Calif.). To voice an opinion on this or any Integrated System Design article, please e-mail your message to miker@isdmag.com. integrated system design January 1998[ Articles from Integrated System Design Magazine ] [ ICs and uPs ] [ Custom ICs and Programmable Logic ] [ Vendor Guide ] [ Design and Development Tools ] [ Home ] For more information about isdmag.com e-mail cam@isdmag.com For advertising information e-mail amstjohn@mfi.com Comments on our editorial are welcome Copyright © 2000 Integrated System Design |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints| RSS|
Digital| Mobile |
| Network Websites |
|
International |
|
Network Features |
|
|
|
All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved. Privacy Statement | Terms of Service | About |