United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 



System Design

General Instrument Designs A Set Top Box Chipset

A design methodology of a satellite and broadcast terrestrial receiver decoder chipset.

By Brian Balthazor, Mike Lodman, Steve Pollmann, Rick Price, Serdar Yilmaz,and Linley Young


As computers and video begin to merge, the arrival of a whole host of digital video products is imminent. In this article we address the General Instrument (San Diego, CA) design approach to a five chip set DigiCipher II Decoder implementation for both satellite and broadcast terrestrial applications. We fabricated the 2.5 million transistor chipset in the 0.65µm triple-layer metal (3LM) process from Motorola (Phoenix, AZ). This chipset represents in excess of 50 man-years development time.

Figure 1 depicts the DigiCipher II satellite and cable digital services delivery system. A DigiCipher II encoder digitizes and compresses video and audio source material and combines it with additional access control information and digital data services. The merged data stream is then error correction encoded and transmitted to the satellite with each digital service occupying a fraction of the satellite transponder bandwidth.

A home backyard satellite dish, local broadcast station, or a cable headend subsequently receives the satellite digital stream. In each case, the receiver performs the required demodulation, error correction, and decompression of the received digital stream. In the home, a consumer integrated receiver/decoder (IRD) receives the DigiCipher II digital data stream which reconstructs the video, audio, and data services for subsequent display and listening. The local broadcast station or cable headend uses a commercial grade receiver to provide either analog or DigiCable II digital services to the consumer via cable. The DigiCipher II system allows full flexibility in transmission data rates and service multiplexing.

The five ASICs in the graphic DigiCipher II satellite/cable receiver application (see Figure 2) are: the forward error correction processor (FEC), the access control processor (ACP), the graphics transport processor (GTP), the video decompression processor (VDP), and the audio decompression processor (ADP).

Architectural Overview The incoming analog signal is demodulated and error-corrected by the FEC. This results in an MPEG-2-compatible data stream (Figure 2). The ACP accepts the MPEG-2 stream and performs authentication of the requested services, routing the MPEG-2 stream simultaneously to both GTP and VDP. The GTP interprets the audio MPEG-2 transport layers, routing digital audio information to the ADP for subsequent Dolby AC-3 decompression and output to D/A converters. In addition, the GTP processes the transport layer for other services simultaneously and provides a 16-bit data bus interface to external DRAM for the storage of (OSD) dynamically loaded pixel information, and for data access by the 68xxx processor firmware.

The VDP accepts video related MPEG-2 transport packets from the ACP and builds video frames in external DRAM which has a 32-bit data bus to achieve the required bandwidth. The resulting CCIR-656 video stream is routed to the GTP for overlaying of OSD (if active), and the final signal is then routed to the video encoder.

Several requirements drove the described ASIC partitioning. General Instrument desired to license the video decompression technology and/or ASIC implementation, so the VDP was designed as a self-contained processor with an MPEG-2 transport level input and a CCIR-656 digital video output. This allowed the VDP to work in both multimedia applications and real time MPEG-2 video decompression systems. Similarly, it was desirable to license or market the audio decompression ASIC as a separate product and maintain system flexibility by providing both Dolby and pin compatible MUSICAM audio support. For these reasons, the ADP became its own functional entity. The FEC partitioning was driven primarily due to power constraints. The ACP is the heart of General Instrument's proprietary security system and is also available to licensed third-party equipment designers.

The following sections provide a brief description of each chipset ASIC.

Forward error correction processor (FEC) A critical part of the satellite data delivery system is forward error correction whereby redundant data are inserted at the encoder and subsequently interpreted by the decoder, automatically detecting and correcting certain errors. The concatenated decoding algorithms implemented are Reed-Solomon decoding and Viterbi decoding, otherwise referred to as maximum likelihood decoding. The Viterbi decoder supports multiple data/code rates including rate 1/2, 2/3, 3/4, and 7/8. The Reed-Solomon decoder supports an m=8, t=8, 188/204 code. The FEC also contains a deinterleaver for spreading the Viterbi decoder output burst errors over several Reed-Solomon blocks. The FEC supports output clock rates up to 29.3MHz. Advanced concealment circuits at the receiver overcome transmission burst noise and interference, providing clean audio, video, and data services. The FEC implementation and physical device differs for cable and other media; however, the four other chipset ASICs are common to all applications.

Access control processor (ACP) The ACP performs the necessary access control and authentication operations associated with the DigiCipher II transport layer. The ACP receives an encrypted DigiCipher II stream; if authorized, it decrypts the specified stream(s). The ACP is capable of decrypting six packet streams, each identified by separate packet identifiers (PIDs), and accepts two entitlement control messages (ECM) streams for access control. Access control is achieved through hardware implementation of a modified version of the Data Encryption Standard (DES)

Figure 1. The encoded data stream is transmitted via the satellite encoder and is subsequently received by either a home backyard satellite dish, local broadcast station, or cable headend.

controlled by an embedded microcontroller containing an 8-bit processor, ROM, battery-backed RAM, power management circuit, various security circuits, and a three-port DMA interface. The ACP also contains a universal asynchronous receiver/transmitter (UART) interface which communicates with an optional external ASIC via a secure serial protocol used to extend the on-chip access control features. The packet processor provides front-end MPEG-2 transport layer parsing to route ECM packets internally. This module is configured via a synchronous serial bus which is fully compatible with the Motorola serial peripheral interface (SPI).

Figure 2.The demodulated MPEG-2 compatible data stream is corrected by the FEC, sent to the ACP for authentication, and simultaneously routed to the GTP, APD, and VDP for OSD, audio, and video processing

Graphics and transport processor (GTP) The GTP performs three primary functions: (1) processing MPEG-2 transport stream and message oriented data, (2) routing audio data to the ADP, and (3) superimposing on-screen display images on VDP video and directing the combined image to the video encoder. The GTP also interfaces to the 68xxx processor to allow dynamic OSD font loading and manipulation, and several ancillary firmware functions.

The GTP extracts the audio, video, and control signals from data streams in the DigiCipher II system and processes them. A transport stream is constructed of consecutive fixed length packets with each packet being 188 bytes in length. The first four bytes of each packet contain a packet header, while the payload portion of each packet is normally 184 bytes and carries the actual data (audio, video, and control).

Audio data is extracted from MPEG-2 transport, preprocessed, and directed to the ADP for decompression and output.

The GTP OSD accepts decompressed video data routed from the VDP. This architecture allows more elaborate video blending functions as opposed to simply modifying the VDP video memory directly with the 68xxx processor. The OSD may be configured to display up to 4,096 colors simultaneously on screen from a palette of 65,536 colors with resolutions up to 704 x 576. In addition to video blending functions, the OSD also provides for transparent video, subtitles, and teletext. DRAM controlled by the GTP allows for support of ideographic fonts and large electronic program guides.

Video decompression processor (VDP) The VDP chip is capable of decompressing video in NTSC or PAL mode and in a variety of resolutions; 352, 528, and 704 horizontal and half or full vertical resolutions. Transport packets containing video, audio, and data are sent to the VDP via a serial transport packet interface. The VDP extracts compressed video data from the transport stream and decompresses video according to the MPEG-2 and the DigiCipher II standards. These standards define both the coded representation of video data and the methods of reconstructing video data. DCII defines a standard for special prediction and for block motion compensation. The VDP outputs the decompressed video on an 8-bit bus in conformance with the CCIR-656 standard.

Audio decompression processor (ADP) Dolby AC-3 is emerging as the leading digital compression algorithm for high quality multi-channel audio in systems requiring low data rates. The AC-3 algorithm is designed to accommodate a wide variety of input and output channel formats, from a single mono channel up to five full bandwidth audio channels plus a low frequency subwoofer channel (5.1 channels), in a variety of bit rates. AC-3 is a frequency domain-based audio compression algorithm which processes audio in discrete blocks representing 256 samples for each of the input channels. Input channels are encoded together as a single transmitted audio block, and groups of six blocks are combined into synchronization frames. Each frame includes a bitstream information (BSI) field that contains information about the current frame, e.g. the number of input channels. DigiCipher II decoders will decode an AC-3 bitstream containing any combination of encoded input channels at any bit rate up to 384 kbps, at either 44.1- or 48 kHz sampling rates, and will produce either stereo or mono output, including a Dolby Pro-Logic compatible output.

Figure 3. The design flow is an iterative process moving from an RTL description to a Verilog netlist and finally a layout database.
System requirements and ASIC partitioning A 5V pad ring and a 3V core were used in each ASIC, thus allowing the chips to interface to 5V ICs while gaining the power dissipation benefits of a 3V design. To perform the required level shifting on signals running between the pad ring and core, Motorola provided a 5- to 3V buffer for signals driving into the core and a 3- to 5V differential transmitter/receiver pair for signals leaving the core. The level shift cells do increase propagation delay in the signal paths.

Power consumption and packaging We desired conventional, low-cost plastic packaging for all ASICs to provide the lowest cost component, which further required power limiting to achieve targeted heat dissipation goals. All circuits ultimately satisfied their targeted limits requiring no thermal enhancements.

Testability Each ASIC included logic to facilitate both debug and production test, and all but the GTP utilized special operating test modes to provide visibility (and in some cases, control over internal signals). The architecture of each particular ASIC dictated the design of the test logic. In the case of the FEC, test multiplexers are used to both view and supply internal values to the different stages of the pipeline. The VDP uses an 8-bit output port to view internal control signals, while the ACP uses two hardware test modes. The ACP also makes use of a suite of embedded firmware routines for built-in self-test (BIST).

The chipset also implements boundary scan technology to support board level testing. Each ASIC contains a Test Access Port (TAP) compatible with the IEEE boundary-scan standard 1149.1. The TAP is used to access the boundary-scan controller and chain which support the following test instructions: IDCODE, BYPASS, SAMPLE/PRELOAD, and EXTEST. On the ADP, the TAP is also used to access memory BIST logic.

In addition, care has been taken in design and cell utilization to allow I DDQ testing of each chip to complement full clock rate functional testing.

Development Methodology Figure 3 presents the overall DigiCipher II ASIC design methodology. Designers initially worked from a set of system requirements and definitions generated by the GI systems group. A C-model demonstrating gross functionality was created by either systems or ASIC designers to assure correct bitstream processing. RTL descriptions were then created in Verilog and subsequently simulated to verify that the modeled hardware behavior was identical to the C-model. The design was then synthesized to gates and re-simulated to verify agreement with the RTL models. Following layout, we used back-annotated timing delays to simulate actual layout conditions and verify the RTL model behavior.

Figure 4. In subsystem verification, several transport streams are simulated to ensure MPEG-2 syntax and good functional coverage.
C modeling and compose bitstream To quickly provide designers with bitstreams to test a wide range of decoder functions, The GI systems group developed C-models of both the encoder and decoder, which provided the benefit of a second source verification of decoder implementation and further minimized debug time.

The main functions of the encoder model include video data compression, video, audio, and ECM packetization multiplexing of the video, audio, and data packets into bitstreams, and generating a detailed trace file for each vector set. The VDP chip designers behaviorally simulated about 70 bitstreams representing 500-plus frames. The same bitstreams were also decompressed by the decoder model which output reference video data that was compared bit by bit to the VDP's video data. The GTP chip designers simulated approximately 30 MPEG-2/DCII bitstreams with an average of 30 packets each for a total of about 2,000 ECM messages. About 50 AC-3 frames and 100 MUSICAM packets were simulated on the GTP and ADP designs hooked together. About 700 NTSC and 300 PAL frames were behaviorally simulated to verify OSD functionality in the GTP design, and over 50 packets were simulated to verify PLL control and master clock generation.

Behavioral coding Designers spent well over 50 percent of the design time coding each design in Verilog. Behavioral coding took on various forms depending on each designer's preference. Care was taken to use non-blocking assignments, which removes the order dependency within a begin/end block. This was needed to ensure that the behavioral model and the synthesized version of the model would be logically equal. Designers typically partitioned the design on a functional basis with special attention paid to isolating clock domains in an effort to minimize asynchronous interfaces.

Designers were able to choose between two simulators for running behavioral simulations: VCS from Chronologic Simulation (Los Altos, CA); or Verilog from Cadence Design Systems (San Jose, CA). There were trade-offs between the compile and run times of the simulators, so it was the designer's decision to choose the simulator with the least overall compile and run time. Because VCS has a slower compile time but a faster simulation time than Verilog, designers typically chose Verilog for simulating short vector sets, and VCS for long vector sets. Simulation results from both simulators were analyzed using a waveform display program called SignalScan.

The systems group and designers of the VDP and GTP used two different tools to visually check the decoded video data and to aid in debugging: a shareware application called Xview, and the Viewgraphic Viewstore. Xview is a Sun tool used in the X-windows environment, which accepts a YUV data format and outputs video to a Sun workstation monitor. A C program was first used to convert video data from CCIR-656 format to YUV. Xview further allowed designers to zoom in on selected lines and samples of the frame.

Designers also used Viewgraphic's Viewstore hardware for real time playback of digitized video which provided the ability to visually measure video quality across several frames and perform side-by-side comparisons of video quality in the various prediction modes as defined by the DigiCipher II/MPEG-2 standards. We used a C program to convert frames of CCIR-656 video to the RGB format accepted by Viewstore. Users can step through frame by frame and display field one and field two separately for debugging purposes. Figure 4 graphically describes the video subsystem verification process.

Figure 5. GTP Simulation Environment. The LM1200 hardware modeler along with a Chronologic VCS PLI interface and a ViewStore box provides a system-level simulation environment for extensive testing of the GTP.
Dolby Test Vector Sets To insure compliance of the DigiCipher II ADP ASIC with the Dolby AC-3 audio standard, Dolby Labs generated a large number of compressed AC-3 audio files to exercise the various features and functional switches of AC-3. Over 400 different functional tests were run on the ADP Verilog simulation model, comparing the results with the Dolby golden AC-3 simulator.

MPEG-2 Test Vector Sets Numerous transport streams were exchanged between General Instrument and external sources including Teracom, Scientific Atlanta, Sarnoff, GoldStar, Alcatel, and Samsung. The main goal was to ensure our interpretation of the MPEG-2 syntax and also to ensure good coverage of a wide variety of DigiCipher II/MPEG-2 modes.

Approximately 40 frames were behaviorally simulated on the VDP. This served to verify packet processing and video decompression at various transport rates. The GTP chip designers simulated about 20,000 packets to verify packet processing and PLL control. The ACP chip designers also simulated about 20,000 packets to verify forced encryption mode, unencrypted packet processing, and processing of packets which had the scrambling control bits randomly set to encrypted.

The LM1200 hardware modeler was selected for hardware modeling of the Motorola MC68xxx component and for exhaustive testing of the GTP. Figure 5 displays the simulation environment using the LM1200. Logic Modeling provides a Verilog Programmable Language Interface (PLI) for the Chronologic VCS simulator which allows the 68xxx hardware model to be instantiated into the simulation. The 68xxx processor physically resides in a circuit card within the LM1200 which talks over the ethernet network to the VCS PLI interface. As this is a functional only simulation environment, all DRAMs, SRAMs, and ROMs were written with zero delay models to assure the quickest simulation time. The 68xxx firmware was created using the Microtek Development Systems (Hillsboro, OR) development environment and brought into the simulation environment as a single contiguous ROM image in a Verilog compatible hex ASCII format. A custom C utility performed the Microtek object code to hex ASCII translation.

A separate debug model was also included in the environment which monitored GTP pins and generated log messages, indicating GTP read/write register activity and providing further absorbability. Custom PLI provided the ability to write binary file video data at the output pads of the simulation, and then a post-processor transformed the data into ViewStore compatible format. The video data was then displayed on a monitor after downloading to the ViewStOR

Synthesis methodology After encountering several performance problems with version 3.1a of Mountain View, California-based Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys ' development tool suite, we went with version 3.0 to synthesize the ASICs. We preferred Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys dc_shell scripts over interactive synthesis because of the iterative nature of synthesizing large designs. We used a combination of bottom-up and hierarchical compile strategies. Modules were partitioned into 5,000-10,000 gate blocks and were compiled hierachically and individually. We then performed a bottom-up compile to connect these blocks together at the top level. As per the example in Figure 6, assume an ASIC had a top level called TOP and submodules called A, B, and C. Also, assume A has submodules A1, A2, and A3. Modules A, B, and C were compiled hierarchically and TOP was compiled with a dont_touch attribute on A, B, and C. Hierarchical compilation is the process of reading in modules A1, A2, and A3 and compiling them simultaneously under module A. This approach lets Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys optimize the intermodule paths and is the most straightforward because constraints need to be applied at module level A and not at the submodule level A1, A2, and A3.

Static timing analysis We performed a final static timing check for setup and hold following layout to ensure that all nets in the ASIC design meet timing requirements for best and worst case

Figure 6. A typical design hierarchy illustrating the bottom-up synthesis in which constraints are applied only to modules A, B, and C.

operating conditions. Prior static timing checks used estimated delays based on estimated wire lengths. For this timing check, we used actual calculated wire loads from the layout based on length, fanout, and technology. We used the net loads from the layout to generate net delays for best- and worst-case operating conditions by the delay calculator in the Motorola ASIC tool kit. This delay calculator returns a Verilog SDF format file with which Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys is able to read in and back annotate the corresponding nets. Clock constraints, input timing constraints, and multi-cycle and false path attributes are then applied.

Two reports need to be generated for a complete static timing analysis: (1) a setup check on all flip-flops needs to be performed using the worst-case timing files generated, followed by (2) a hold check on all flip-flops using the best-case timing generated. These two checks should insure that the design will properly perform within the operating range specified. A constraint report for both operating conditions is then generated within Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys , with all violators listed. It is likely that one will see some false paths in the report which have been missed during the constraints, for example, paths crossing clock boundaries or input and output ports of a memory tied to a bidirectional bus. These can be marked out and the design retimed and another report regenerated.

Workstation configurations Both synthesis and simulation can consume considerable workstation resources. Generally speaking, module level synthesis, RTL simulation, and gate simulation did not cause many problems. Assuming one uses the Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys recommended 15 kgate-or-less per module rule, modules could synthesize using a workstation with 128 mb RAM and anywhere from 128- to 640 mb of virtual memory. Even ensuring that no logic exists at the ASIC top level, Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys has a voracious memory appetite which is further whetted when multiple clocks are employed. Top level simulations almost always required a machine with 640 mb of RAM and several gigabytes of virtual memory. All machines were SPARCstation 10s with either one, two, or four processors.

Design verification methodology We used a number of Motorola-supplied tools after netlist synthesis for clock tree insertion and to verify the correctness of each design and vector set.

Once each design was synthesized, designers used an electrical rules check (ERC) tool to check for maximum fanout violations, illegal I/O pad configurations, multiple drivers, and unconnected inputs. For those cases categorized as warnings, the designers had to determine whether the warnings were false or whether a fix had to be made. Error messages resulted in a design fix.

Next, designers ran a delay calculation tool which read in the netlist and user specified technology libraries for best and worst case conditions. The tool output SDF files for each condition, along with edge reports listing nodes having excessively slow rise/fall times. In order to solve these problems, designers opted to fix the netlist manually if the number of problems was manageable or re-compile the netlist with Synopsys .com/isdweb/&lf=isd-sendtolog"> Synopsys .

After the problems were corrected, layout engineers worked with designers to develop a floorplan and place macrocells. This was followed by running a Motorola-supplied clock tree synthesis tool that bases its optimization algorithm on placement information. In addition, the designer also supplied the tool with a set of initial conditions and goals (such as the maximum skew across each clock network, the tree depth, and the preferred buffer type). Finally, the tool inserts clock buffers into the design and creates a new Verilog and DEF netlist.

After running and verifying best- and worst-case simulations, the designers fed the Verilog Value Change Dump (VCD) files from the sims into a tester tool that checks for violations of tester restrictions. The tool also indicates strobable regions within a cycle for each output.

The final step in vector verification involves running a dithering tool which attempts to mimic the tester skew on inputs. The tool randomly skews both clock and data by the maximum tester skew amount and checks whether the outputs still match the outputs from a pre-dithered simulation.

High-level design a necessity In a demanding market such as home electronics, time to market is extremely critical. Schedules tend to be very demanding, providing little time for methodology shortcomings and surprises. VLSI design at this level of complexity (2.5 million transistors) is unusually computer hardware and software resource intensive. It is estimated that close to 100,000 SPARC-station 10 equivalent CPU hours were consumed during the initial design phase first pass VLSI tapeout.

Managing the simultaneous development of five very complex VLSI devices can only be accomplished through the use of behavioral design languages and logic synthesis tools as described herein. Attempting this project with conventional schematic design, entry and verification techniques simultaneously with the evolution of compressed video/audio entertainment delivery systems would be worse than impossible. It is estimated that General Instrument's cost of tools per VLSI engineer is in excess of $100,000 per engineer. While this appears to be very high, the alternatives are poor considering the marketplace opportunities. *

Brian Balthazor has design experience in telecommunications, computers, and multimedia; Mike Lodman has design experience in digital audio, RISC CPU, memory cache, and DSP ASICs; Steve Pollmann has design experience in FPGA, neural network, forward error correction, and DSP ASICs; Rick Price has design experience in focal plane array (FPA) readouts and custom ASICs; Serdar Yilmaz has design experience in massively parallel computers and DSP ASICs; Linley Young has design experience in DSP ASICs.

To voice an opinion on this or any Integrated System Design article, please e-mail your message to: michael@asic.com.


integrated system design  March 1995



[ Articles from Integrated System Design Magazine ] [ ICs and uPs ]
[ Custom ICs and Programmable Logic ] [ Vendor Guide ]
[ Design and Development Tools ] [ Home ]



For more information about isdmag.com e-mail cam@isdmag.com
For advertising information e-mail amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 - Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About