| |
System Design
Low-Power Key to Implement 8-mmTape Drive
Advanced tape storage calls for higher capacities and faster data transfer, but form-factor and head-to-tape interface temperature must be maintained. Special low-power
ASIC techniques step in to meet those demands.
by Richard Schmidt and Bob Sugars
Exabyte Corporation (Boulder, CO), in conjunction with AT&T Microelectronics (Allentown, PA), developed a new tape drive controller. Our 8mm tape drive, Mammoth, has six times the transfer rate, almost three times the capacity, and 25 percent greater reliability than any of our previous products. Mammoth is a 20 Gigabyte native mode, 40 Gigabyte with compression, 8mm helical scan
drive. It transfers data at 3 Mbytes/s. The drive supports four SCSI varieties, including 20 Mbytes/s burst transfer from the host in native mode. We accomplished these impressive product specifications with two well-crafted, high-density 3.3V CMOS standard cell ASICs.
|
Figure 1. The CBC
interfaces to a SCSI chip which then connects to the host. After the compression block takes in data for compression, the interface port packs it into the suitable format for the tape drive and sends it to an outside data buffer. The data port then takes data in, transmits it through the MEF ASIC, and onto the tape.
|
Controlling the drive, the two ASICs minimize power consumption, deliver high-performance, and interface efficiently to an existing 5V subsystem. One
of the two ASICs is the compression and buffer controller chip (CBC). The second is the Mammoth ECC and format (MEF) chip. We worked jointly with AT&T Microelectronics to meet the following design objectives:
- Substantially reduce power consumption to stay within the same 5 1/4-inch form factor, and maintain a low temperature rise within the product.
- Increase data transfer rate and overall subsystem performance at that lower power.
- Stay in a low-cost plastic package (PQFP) without a
thermal heat spreader. Power in excess of 1.5 W per ASIC would require a heat spreader, which would add unacceptable expense to this cost sensitive program.
Selection criteria
Initially, we chose a 5V, 0.8µm, three-level metal technology to develop these two ASICs. However, this proved inadequate; thus, we created a new vendor selection criteria based upon our prior experience. A very aggressive design interval for these two 100K+ gate designs dictated that the vendor
technology, design methodology, design expertise, manufacturing prowess, and project management would be critical to the success of the new tape drive.
Reduction in operating voltage would reduce power, but the ASIC devices would still interface with the rest of the 5V system. A 0.5µm process technology would achieve ASIC performance goals. AT&T-ME met those requirements. The ASIC technology with its 3V, 0.5µm process delivers 5V, 0.5µm performance. Also, AT&T's
process was fully production-ready at the time we needed it.
Figure 2. The MEF ASIC has a write and read path and operates at 53 MHz with a 203,000 equivalent gate complexity.
The process supports device operation from 2.7 V to 3.6 V, with an intrinsic gate delay of 80 ps. Interconnect and isolation
rules are optimized for standard cell layout to achieve up to three times the density, compared to the earlier 0.6µm processes.
Linchpin
A 5V/3V interface I/O cell in each chip is the linchpin of this design, because these low-power ASICs directly interface to 5V circuitry, comprising the remainder of the subsystem. The thinner oxide used in a 3V CMOS process poses a problem when interfacing 5V to 3V circuits (
Figure 3A
).
Alternative
solutions involve (1) selecting thicker oxides and special transistors which lead to a more complex, expensive process; or (2) recharacterizing a 5V process for 3V operation--even though, there is a more than 40 percent performance penalty compared to a 3V optimized process. Also, the necessary translator cells degrade performance by an additional 16 percent.
|
Figure 3A. Thinner oxide used in a 3V CMOS process poses a problem when interfacing 5V to 3V circuits. Therefore, a 5V/3V interface I/O cell in each chip is necessary.
|
AT&T Bell Laboratories opted for a patent-pending circuit approach which reliably accepts input signal levels up to 5.5 V for transistors with thin gate oxides. It is also ESD and latchup resilient, protects against hot
carrier degradation, and supports TTL output levels as well as low-voltage CMOS and TTL (see
Figure 3B
).
Methodology
We determined early on that designing ASICs with up to 200,000 gates involves several issues:
- Version control of behavioral vs. gate level netlists.
- Turn around time to implement design changes.
- Testability.
- Layout through verification cycle times.
- Successful achievement of performance goals (post layout).
We took a hierarchical approach to behavioral design and, with
Synopsys
.com/isdweb/&lf=isd-sendtolog">
Synopsys
'
synthesis and version control in mind, partitioned it. Our design methodology writes behavioral descriptions at a smaller level and provides scan insertion at the lowest level. Consequently, upper levels of the design are
void of logic, but they have the interconnects between compiled modules.
| What two complex ASICs provide a tape drive?
|
| The CBC, with 173,000 equivalent gates and 40 MHz operation, is the modern counterpart of the buffer memory in a tape controller architecture. It supports a 20 megabyte per second burst rate for SCSI transfers and built-in 10 megabytes per second data rate for compression
and decompression.
The CBC ASIC manages up to eight megabytes of buffer space. It matches speeds between the host computer and tape interface as well as for user data catching functions. Buffer management also aids data-integrity-checking throughout the compression process. Data integrity is maintained from the host CPU, to the tape, and then back.
The CBC interfaces to a SCSI chip that connects to the host. The compression block accepts data for compression. The interface
port then packs it into the appropriate format for the tape drive and sends it to an outside data buffer. The data port takes data in and transmits it through the MEF ASIC and onto the tape. The data port also wraps the data back for verification and decompression through the decompression circuitry (see Figure 1).
Operating at 53 MHz with a 203,000 equivalent gate complexity, the MEF ASIC has a write and a read path (see Figure 2). The write path takes data from the CBC, routes it through
the ECC, and then sends it to the formatter, which formats it for writing to the tape. The data then goes to write logic outside the MEF to the tape. The read path reverses the formatting process, takes data through the deformatter and error checking/correcting, and moves it to the CBC's CPORT.
The ECC generator RAMs use a 96 x 8 FIFO. A three-stage pipeline performs error correction. Normally, to do correction, the RAM reads data out and runs it through the pipeline. If correction is
necessary, the RAM must read data out again. But in this case, the FIFO temporarily holds data in an uncorrected form. If data needs correction, the FIFO corrects it and then writes it back to the RAM, thus saving RAM bandwidth.
|
Conventional
Synopsys
.com/isdweb/&lf=isd-sendtolog">
Synopsys
/Verilog
methodology requires about two days turn-time to correct and resimulate a gate-level problem. Through the use of this "fine grain partition" methodology, we reduced that turn-time to two to three hours. This allowed us to make logic changes within smaller areas of the design, whereas before, changes and re-simulation associated with large partitions and boundary scan reinsertion, consumed inordinate time.
A UNIX-like make function is associated with those lower level compiled modules. To avoid
mismatches, it verifies that the current synthesized gate level is in line with the current behavioral level. This technique automates the link between behavioral description translation to structural, thereby insuring version control does not consume extra program management time, and design risks are reduced.
One area which fit well into this methodology was the AT&T-ME. It provided flexible environment for gate-level Verilog simulation. When using tool kits from other vendors, we
found that the directory structure for the design database necessary to retrieve back annotated timing had been vendor defined.
In this situation, the post-layout verification process requires engineers to devote much of their time to designing the "make-like" software, linking divergent design database structures between behavioral, pre-layout structural, and post-layout structural. The Verilog and
Synopsys
.com/isdweb/&lf=isd-sendtolog">
Synopsys
Design Kits allow user-defined design environments, saving time and engineering resources.
What helped the design was AT&T-ME complementing its 3.3V standard cell library support with an array of utility tools: a golden timing calculator, a topological analysis tool for ERC (Electrical Rule Checks), a graphical display tool for overlay viewing of simulation
results from various simulation runs, and a dynamic timing simulator called Hazard Analysis.
 |
Figure 3B. The patent-pending circuit accepts input signal levels up to 5.5 volts for transistors with thin gate oxides. It is ESD and latchup resilient, protects against hot carrier degradation, supports TTL output
levels; both low-voltage CMOS and TTL.
|
We also took advantage of AT&T-ME's floor planning and place & route tool arsenal.
In doing so, the cost of an on-site floorplanner was not required. For both the MEF and the CBC, the accuracy of the prelayout timing estimates, and the power of the layout tools resulted in designs which achieved performance and die size goals without iterations in layout. In fact, both ASICs had better performance with
the extracted capacitance and RC parasitics from the layout than AT&T's pre-layout estimate predicted.
Another feature of AT&T's layout tools, especially important in our high-gate-count devices, is the ability to generate symmetrical, well-balanced clock trees. Our design goal was for less than three-quarters of a nanosecond--what we accomplished is an actual skew on these clocks of under one-half of a nanosecond. *
Richard Schmidt is a senior ASIC designer
and ECC engineer in the Mammoth Group at EXabyte.
Bob Sugar is a senior architecture engineer and ASIC designer in the Advanced Technologies Group at Exabyte.
| Lowering power even more
|
| With power reduction as a critical concern in developing the 0.5um, 3.3V library, AT&T Microelectronics (ME) focused on optimizing performance in all
compiled memories without increasing power. Exabyte took advantage of AT&T's enhanced memory compiler architectures for the SRAMs and register files used in both ASICs.
By segmenting the memory array into sub-blocks, the bit-line, word-line, and I/O-line capacitance driven for any read or write is reduced two- or four-times, compared to that of a conventional array architecture. These factors translate directly into power savings and performance improvements.
The data in
Table 1
compare the conventional and block partitioned architectures. Word-line capacitance is reduced to only one-quarter (or CWL/4) in the block partitioned SRAM structure. Also, the number of cells accessed in any one-half of the array is only one-quarter compared to that in the conventional architecture. Hence, power associated with driving the word-line and bit-line capacitances is CV2F/4. Moreover, since the I/O-line doesn't have to be completely driven across the array bottom,
capacitance is essentially sliced in half, improving data path speed to the output buffers.
The SRAM macrocell also features clock duty cycle, independent precharge timing, and power-down mode. These features were accomplished by using self-timed logic circuits. The initial bit-line precharge time does not depend on the width of the system clock to the macro, but varies depending upon the voltage, temperature, and process parameters.
|
To conserve power,
once the cell access is completed, all DC current paths are turned off and a partial precharge is initialized. Pre-decoding of the addresses further reduces power, decreasing the capacitance on the internal address decode lines by one half.
The register file is a dual-port memory with read and write ports and addresses for both. The read port is asynchronous and the write, synchronous. A column multiplexing scheme is used to improve performance in this case. In effect, it reduces by half
the number of cells in any one column, as well as half the capacitance on those bit lines, to permit a faster read and write.
Figure 5
is a high-level block diagram of the register file architecture. The circuit blocks associated with the column multiplexing scheme are indicated on this drawing. The cell read/write operations for a particular bit in a data word are performed between two adjacent columns. Each has a column select (cs) and precharge circuit (pc) which
select particular columns and precharges them before read or write is performed.
The data input latch, common to both columns, is selected with the write column select circuitry controlled by AO. When AO is high, it is CSA or left column select; when AO is low, it is CSB or right column select. Two sense amplifiers are selected the same way with the least significant bit. Both sense amps share a common output buffer. A single data input and an output buffer per bit within the I/O is shared
between the two columns.
| |
 |
Figure 5. Cell read/write operations for a particular bit in a data word are performed between two adjacent columns. CS and PC select columns and pre-charges them before read or write is performed.
|
The register file also has a read after write control. The address match circuit reviews read/write addresses and determines if a simultaneous read/write is occurring at a particular word. If so, the read after write circuitry always insures the write operation is immediately followed by a read.
A custom reset function was added to the register file design to meet the customer needs in the CBC IC. A prime concern in developing the reset circuit
methodology was to minimize the potential current
|
spikes associated with resetting/writing all the register file cells simultaneously. If all the memory cells were written at once, a significant glitch would occur on the power bus and potentially corrupt other circuit operation. A scheme was developed to reset rows of cells in a sequential manner over a 16 clock cycle period. The entire reset operation is controlled and timed internally in the register file.
Mike
DePaolis is a technical manager for AT&T Bell Laboratories in Allentown, PA.
|
|  |
Table 1. SRAM array architecture comparison.
|
|
To voice an opinion on this or any
Integrated System Design
article, please e-mail your
message to
michael@asic.com
integrated system design February 1996
[
Articles from Integrated System Design Magazine
] [
ICs and uPs
]
[
Custom ICs and Programmable Logic
] [
Vendor Guide
]
[
Design and Development Tools
] [
Home
]
For more information about isdmag.com e-mail
cam@isdmag.com
For advertising information e-mail
amstjohn@mfi.com
Comments on our editorial are welcome.
Copyright © 1996 -
Integrated System Design
Magazine
|
|
SEARCH JOBS
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.
For more great jobs, career related news, features and services, please visit EETimes' Career Center.


|
|
|
|