The emergence of SoC has been described as a development that will require fundamental changes in the approaches to design-for -testability (DFT). This will take the form of a "test re-use" strategy or the adoption of logic BIST. However, analysis of the costs and design techniques associated with the design of SoC shows that these approaches will offer no advantages in the majority of design scenarios. SoC will require an evolution rather than revolution in the world of DFT.
The first issue that must be addressed: "What is the fundamental difference between an SoC design and a conventional ASIC design?" The most significant difference is that an SoC design usually contains a number of blocks provided by external IP suppliers.
Other differences: an SoC may contain a mix of both digital and analog blocks. SoC designs are hierarchical in practice the design approach for any large chip will make extensive use of hierarchy. Another potential difference between SoC and ASIC design is shorter designs times.
To understand the tradeoffs between different DFT approaches, it is important to understand the basic hierarchy of costs involved in a design. Let's look at the cost requirements for an SoC design that will be a $20 chip with a production volume of 1M parts:
- A leading edge Unix workstation costs about $20K
- An EDA ATPG licence costs about $100K for a three -year licence
- ATE test time for a mid-range tester is around $0.10 per second. For a ten second test time at wafer and package levels, this would give a total tester cost of $2M.
- The silicon costs will be $20M. An additional DFT overhead of 5% would add $1M to the overall product cost.
From this simple analysis, it is obvious that the processor time and EDA licences are some of the lowest cost items in the design process.
So, the DFT approach should always be to solve problems by use of CPU time and EDA tools rather than impacting the test time or die area. This argument might appear counter-intuitive, since DFT/ATPG are fundamentally based on the use of scan chains that have a significant impact on chip area and performance. However, in the case of scan chains, no realistic alternative exists. No known ATPG algorithm, no matter how CPU time intensive, can handle unconstrained sequential logic. Even the use of partial scan has largely been abandoned despite years of development due to the lack of reliable algorithms.
The relative typical costs in the different design phases for a 50K gate IP block also show dramatic differences:
- RTL code development and validation in about six man months,
- Place and Route, timing closure and physical verification in four man weeks,
- and Scan insertion and ATPG in one day.
As a further example, on a recently completed design of around 4M gates the final ATPG run, using conventional "flat" ATPG required four days of CPU time. The back-end activities such as Place and Route, timing closure and DRC checking required over 300 days of CPU time. The manpower required for DFT was around 10% of that required for the back-end work. The cost of ATPG is very small in the context of the overall design costs. Consequently impacting these other, more expensive, aspects of the design flow in order to reduce ATPG cost is not justified.
The proposed P1500 standard adds "wrappers" around IP blocks similar in nature to the 1149.1 JTAG standard. These provide isolation so that a block can be tested using standard vectors provided by the IP supplier. Essentially, the isolation guarantees that the results of the test are not dependent upon the environment in which the block is used, provided a standard "test access" mechanism is implemented.
The problem with this approach is that the wrappers impose restrictions and overheads on the design. The most obvious of these is the increase in die size due to the extra gates needed for the wrappers. The wrappers also impose additional delays in the paths between the various IP blocks, the equivalent of two multiplexor delays in each path. Traditionally, these top-level paths can be some of the most critical due to their long lengths.
In addition, the wrappers impose less obvious, but equally important overheads: When performing synthesis they impose a constraint on the synthesis tools and do not allow functions to be merged across boundaries between IP blocks. The use of standardized pre-supplied vectors may also restrict the synthesis by not allowing scan chains to be reordered or to change inversions along the chains.
In addition, the wrappers are expected to increase final device test times by imposing additional levels of protocols when accessing the test structures built into the chip.
The traditional approach to ATPG, starting with a complete chip netlist and generating vectors has a number of significant advantages. Only a limited number of data views are required and the task can normally be managed by a single person. This reduces the complexity of managing the task. In the event of vector debug being required the process is relatively straightforward.
In the "hierarchical assembly" approach to generating vectors, the process involves far more data views and sources of data. The process of debugging failing vectors will also be more difficult because the source of the problem may lie with the pre-supplied vectors, the surrounding access mechanism, the timing of the logic in test mode or the process of "expanding" the vectors to the chip level. As different parts of the process may be owned by different organizations, the debug route is potentially very fraught.
Logic BIST the solution?
Logic BIST has been suggested as the best way forward for SoC test. It reduces the test data volumes and provides a simple mechanism to test blocks independently and in parallel. However significant problems remain. The design must be completely timing and signal integrity clean to prevent any "unknown" states propagating into the signature registers and giving inconsistent results. As device geometries decrease and signal integrity becomes a greater problem, this will be increasingly difficult to guarantee. In comparison, conventional scan vectors can be easily modified to mask inconsistent results.
Logic BIST is also difficult to implement at the RTL level since random pattern testability and test point insertion can only be performed at the gate level. The random pattern fault coverage of a function cannot be determined from a purely functional description. As most IP is developed at RTL and customers map it between different technologies, it is not practical for the IP developer to provide a standard logic BIST implementation.
Commonly proposed techniques for core-based test are unlikely to give any cost advantage. However significant developments are taking place.
There are emerging standards to demonstrate that IP blocks are "DFT friendly". For example the Virtual Component Exchange (VCX) mandates that IP developers must demonstrate that synthesis, scan insertion and ATPG have been successfully run on IP blocks using an example target technology. Another instance is the recently announced MIPS32 4Kc synthesizable processor for which scan insertion, ATPG and fault grading scripts are provided.
A typical feature of IP designs are the large numbers of asynchronous clock regimes. To support these, there are new ATPG dynamic compaction algorithms that generate very efficient vector sets without the need to use a single clock in test mode. The requirement to have a single test clock usually creates significant problems in fixing all possible hold violations and achieving timing closure.
The problem of growing vector sizes and test times will continue to be a key issue. To address these concerns, there are new DFT/ATPG techniques that add on-chip decompressors and compressors to reduce the volume of externally stored test vectors. These exploit the high percentage of "don't care" terms in generated vectors that have conventionally been filled with fixed or random values. These techniques have significantly less impact on the design than logic BIST and are now available from the commercially available DFT/ATPG tools sets.
The requirement to handle steadily increasing chip sizes will also lead to increased use of large "server farms" to provide computational power. In addition, parallel ATPG will be introduced in the form of algorithms that split the fault list and generation task between processors. Alternatively, configurable scan will be used to split the design into a small number of separate blocks, the test generation can then be run for each block from the top level of the design little no loss of efficiency. A small amount of additional vectors will then be generated to cover interconnect between the blocks.
Overall, the future of SoC testing will be based upon the use of more sophisticated ATPG algorithms, which impose fewer restrictions on the design. The back-end design process is becoming more difficult with signal integrity and timing closure issues. As this happens it is important to impose less DFT logic on the design and so minimize its impact.
More sophisticated algorithms and more computational power will address the problems of generating vectors. Most importantly, this task can be performed after the final chip netlist is available and does not need to be completed before tape-out to mask making. This will reduce the impact on the overall project time scales and costs.