The advanced microarchitecture and high interconnect speeds on circuit boards designed around the Intel Xeon Processor 5500 Series (codenamed Nehalem) present design validation and test challenges for board designers and manufacturers. Older legacy test systems typically involve a probe of some sort making physical contact with the circuit board or chip pins on the board.
Unfortunately, the validation and test coverage delivered by this type of intrusive probe-based test and measurement system is rapidly eroding. With decreased validation and test coverage, product quality suffers, threatening the health of product sales over the long term.
Embedded instrumentation validation and test methods rely on instruments that are already embedded into the chips on the circuit board. As such, these non-intrusive board test (NBT) and validation methods can complement and/or replace legacy intrusive technologies and deliver the comprehensive coverage needed. Moreover, methods based on embedded instrumentation are much more cost-effective than legacy hardware-intense instruments and testers like in-circuit test (ICT) systems, oscilloscopes, flying probe testers, logic analyzers and others because software rather than hardware is the critical driver for embedded instrumentation validation and test technologies.
Although seemingly simple, the concept of optimum validation and test coverage can be elusive. Whatever the method of quantification, coverage refers to that portion of a particular circuit board that can be validated or tested to determine whether manufacturing faults are present and whether the board meets the operational performance levels specified for the design. This definition raises several questions.
- How is coverage quantified?
- How much validation and test coverage is enough? Or, more specifically, what is optimum coverage?
Specifying a coverage level for a circuit board presupposes coverage can be quantified. Several methods for measuring coverage have been suggested as standards for the industry, but none has been universally adopted. A standardized way of quantifying coverage would create an effective comparative metric for evaluating design techniques relative to their contributions to the test and validation coverage requirements for a given design.
One example of a coverage model has been suggested by the INternational Electronics Manufacturing Initiative (iNEMI) and is known as PCOLA/SOQ/FAM. The letters in the method’s name refer to the coverage metrics defined in the methodology. Coverage is divided into three segments: structural-device coverage, structural-interconnect coverage and functional-device and interconnect coverage.
The structural-device metrics are: Presence of the devices on the board, correctness of the devices, orientation, live (whether the devices on the board are connected to power) and alignment (PCOLA). Structural-interconnect metrics are: shorts, opens and quality (SOQ). And lastly, the functional-device and interconnect metrics consists of feature, at-speed and measurement (FAM).
A description of iNEMI’s suggested methodology is featured on the group’s web site at http://www.inemi.org/cms/projects/test/FT_assess.html
What is optimum coverage?
Of course, 100 percent validation and test coverage would be ideal because, at least in theory, all faults, failures, potential bottlenecks, manufacturing variances and other possible pitfalls would be identified well before the product is in the hands of a user. Unfortunately, 100 percent coverage is impractical even if it were technically attainable. As coverage increases so does the cost of validation and test until the price of the product in the marketplace becomes uncompetitive. But spending too little on validation and test can be just as hazardous. Without adequate coverage, the quality of the product will suffer, warranty returns will increase, and repair or replacement costs could get out of hand very quickly.
The optimum amount of test coverage is the point of diminishing marginal return. That is, that point where the next dollar spent to increase test coverage will not return an equal or greater amount to the business. Spending beyond this point to increase test coverage would be too much. Spending less, not enough. Of course the real payoff for a breakthrough validation and test methodology like embedded instrumentation and the non-intrusive technologies it enables is the fact that embedded instrumentation delivers coverage so much more cost-effectively than the older, intrusive probe-based test technologies like in-circuit test (ICT), oscilloscopes, logic analyzers, flying probe testers and others.
Test technologies based on software-driven embedded instrumentation reduce the cost of validation and test while increasing coverage. This moves the point of diminishing marginal return closer to full 100 percent coverage without increasing the cost of validation and test. In fact, this can actually decrease the total cost of validation and test. Over the life of a system, this can have tremendous effects in terms of product quality, user satisfaction and market share.
The coverage delivered by older intrusive validation and test technologies has eroded over the last decade as chips and packaging, as well as board design and fabrication techniques have evolved. Several of the technical difficulties intrusive technologies have encountered are described below.
Probe-based intrusive validation and test technologies typically place metal probes on one or more pins on chips or on one or more of the test pads which had been designed into a circuit board for these purposes. Unfortunately, this type of physical access for probes has diminished in recent years as chips and circuit boards have evolved. Chip pins, for example, have become so fine-pitched that they are inaccessible by a probe. Or, the pins are hidden underneath the chip’s silicon die as in a ball grid array (BGA) package. Heat sinks or a conformal coating over the entire board can also render device pins and test pads inaccessible. And, in general, test pads on circuit boards are disappearing as boards become more and more dense.
Capacitive effects on interconnects
Validating or testing high-speed serial chip-to-chip interconnects or input/output (I/O) buses on a circuit board by placing a physical probe on the interconnect is also fraught with difficulties. The speed of these interconnects today often exceeds 5 gigabits per second (Gbps). For example, PCI Express Gen 3 can achieve rates of 8 Gb/s while SATA III is already at 6 Gb/s, USB 3.0 speeds along at 4.8 Gb/s and HDMI 1.3 is capable of 10.2 Gb/s.
When a metal probe is placed on a circuit board test pad on one of these buses, the metal probe introduces capacitive effects on the bus. As a result, it is difficult for probe-based test and measurement equipment to distinguish between faults which may be disturbing the integrity of signals on the bus and the capacitive anomalies introduced by the test equipment itself.
Embedded instrumentation is a methodology which applies instruments that have been embedded into chips to perform validation and test functionality throughout the life cycle of chips, circuit boards and systems. Several specific validation and test technologies take advantage of embedded instruments. These include boundary scan (IEEE 1149.1), processor-controlled test (PCT) and Intel Interconnect Built-In Self Test (IBIST).
Boundary scan (IEEE 1149.1)
As a board test standard, boundary scan was ratified by the IEEE in the mid-1990s as a reaction to disappearing test access to devices. Tests are applied to a circuit board through a connector and the four-wire serial interface known as the Test Access Port (TAP). On chips, this interface is commonly referred to as the ‘JTAG port’, from the informal name of the working group that began development of the boundary scan standard, the Joint Test Action Group. Because of its widespread acceptance and deployment, the boundary-scan infrastructure in chips and on circuit boards has become the basis for other standards, including the IEEE 1149.6 standard for testing high-speed AC-coupled interconnects, the IEEE 1149.7 so-called compact boundary-scan standard as well as several others.
Processor-controlled test (PCT)
PCT takes temporary control of a processor on a circuit board to read and write memory and I/O registers in addressable devices. In this way, PCT exercises the functionality of the board and as a result detects and diagnoses structural faults. It operates at CPU speeds and, therefore, PCT detects faults which only manifest themselves while the board is running at operational speeds. ICT, manufacturing defect analyzers, flying probe testers and boundary scan are static technologies which do not verify operational functionality. As a consequence, PCT’s fault coverage spectrum does not overlap with that of static test technologies like boundary scan.
An example of proprietary embedded instrumentation is Intel Interconnect Built-in Self Test (IBIST), which is being embedded by Intel, Avago and other semiconductor and intellectual property (IP) providers into chips and chipsets. IBIST can be implemented in validation applications to validate the performance of high-speed serial buses and in structural tests on circuit boards. In validation applications, IBIST-based pattern generation and checking, bit error rate, and margining tests are applied to confirm a board design’s signal integrity. Intel IBIST uses the same debug port interface which is used for boundary scan and processor-
Validating and testing a Xeon 5500 board case study
Given the technical difficulties that legacy intrusive test and measurement equipment have encountered in recent years, embedded instrumentation methodologies can provide extensive validation and test coverage for circuit boards based on the Intel Xeon processors series 5500. The non-intrusive technologies that make use of embedded instruments, including boundary scan, PCT and Intel® IBIST, can provide coverage for virtually all of a Xeon 5500 circuit board. These boards are typically one- or two-processor designs, which can have one or two Input-Output Hubs (IOHs), which are also referred to as chipsets – see figure 1.
On Xeon 5500-based designs test access is provided via the eXtended Debug Port (XDP), a 60-pin, small-form-factor connector which provides access to Intel® silicon and system test and debug resources. Boundary scan, PCT and Intel® IBIST only require a small subset of the 60 XDP signals.
Although a 60-pin XDP header usually appears on a circuit board during prototype stages to allow for debugging, it is quite common for the XDP header to be removed from the design prior to high-volume manufacturing to reduce costs. When XDP headers are not available on a circuit board, alternatives can be provided. There are a number of different ways that boundary-scan and PCT run-control access can be provided, per the Intel Debug Port Design Guidelines and the board’s own individual design-for-test features.
Boundary scan coverage
Fig 1: A typical single-CPU Xeon 5500 board design with coverages for boundary-scan test (BST), PCT and Intel IBIST.
The green in figure 1 indicates the test coverage provided by non-intrusive boundary-scan tests. This coverage includes the QuickPath Interconnect (QPI) links. If the test pads that are required for intrusive test technologies were to be placed on these high-speed links, the intrusive technologies would only provide unreliable test results because of the capacitive effects of the test probes. Non-intrusive test technologies based on embedded instrumentation like boundary scan, PCT and IBIST do not require test pads.
Boundary scan is of particular utility for QPI nets, because there is no other deterministic means of detecting shorts and opens on these links. In addition, the performance of the QPI links is critical to the functioning of the circuit board because QPI constitutes the main bus through which all processor traffic flows.
Processor-controlled test coverage
PCT uses emulation technology to read and write memory and registers throughout the CPU-addressable devices on a Xeon board. By doing this, structural and functional faults can be identified. Therefore, PCT is a good complement to boundary scan, effectively extending the test coverage on the board.
PCT provides structural test coverage on nets where, for whatever reason, test points cannot be designed into the board. For example, a board’s PCIe nets may not be tested structurally by boundary scan because devices on the links may not be complaint to the boundary-scan standard or the board was not designed with boundary-scan DFT considerations in mind.
Moreover, since it operates at processor speeds, PCT can supplement the coverage provided by traditional static test technologies like boundary scan and ICT with an at-speed fault coverage spectrum. In addition, PCT test times are very fast for a functional tester, typically one-to-five minutes, depending on the comprehensiveness of the test.
PCT instructs the processor to sequentially test all addressable devices on the board under test. These tests are normally carried out without booting the board to its operating system, so device initialization is handled by the PCT system instead of the unit under test’s (UUT) BIOS. Test programming is greatly simplified by PCT’s automatic test generator (ATG), which identifies the devices present on the board and then assimilates the appropriate device profiles from a built-in library into a board-specific test script.
From a functional point of view, PCT can test all CPU-addressable devices, including the DDR3 memory and PCIe devices that boundary scan is unable to test. Certain structural faults on the board, including shorts and opens and other assembly-related faults, can also be detected and diagnosed as a by-product of the functional testing. Another very valuable feature of CPU emulation functional test is the extensive coverage reporting system that comes with PCT. This allows fault reporting to the component and pin levels and is achieved by importing the board’s netlist and then assigning parts and pins to specific device tests during the test development process.
Because of certain design-for-test restrictions on Xeon processor 5500 series boards, PCT has proven to be particularly useful for performing tests on DDR3 memory. Boundary scan, which could be implemented to test memory, is restricted from testing DDR3 memory because the memory I/O on Xeon Processor 5500 Series boards has been designed as output2-type, meaning that these cells can drive but not receive. And DDR3 memory itself is usually not designed with inherent boundary-scan support. Thus, pure boundary scan by itself cannot be used to detect shorts and opens on memory buses. Fortunately, PCT extends test diagnostic coverage to these memory buses.
Intel IBIST coverage
On Xeon circuit boards Intel’s suite of embedded instruments, IBIST, can be accessed and controlled via the IEEE 1149.1 boundary-scan test access port (TAP), which includes an instruction register and standard TAP controller functionality. Special 1149.1 instructions set up the tests, start them, determine when they complete, and read back failure information. The actual pattern generation and error checking is done by the embedded Intel IBIST hardware at speeds much greater than boundary scan can support.
IBIST is a functional test which uses pseudo-random bit sequence (PRBS) patterns as a foundation for pattern generation and checking, bit error rate testing (BERT) and margining. It does not provide total structural test coverage on a Xeon® board. As a result, it should certainly be deployed in conjunction with PCT and boundary scan to achieve comprehensive test coverage.
When testing QPI links, IBIST calls for the CPU(s) to act as a master and the IOH(s) to function as slaves to fully exercise all links bi-directionally. It should be noted that high-speed serial nets of these sort are expressly immune to common mode noise induced by some structural faults, in which case the links will “appear” to operate normally when there may be faults on the bus. IBIST as a manufacturing test will capture link performance degradation indicators which could be caused by structural faults. In order to detect QPI and PCIe lane performance degradation, the more advanced capabilities of IBIST BERT and margining are applied. For example, marginalities such as component drift across different device lots, trace defects, solder voids or micro-cracks, missing or bad terminations/capacitors, etc. can result in a degraded eye or a high bit error rate.
Meeting challenges with embedded instrumentation
Advancements in chip technology, such as those which have been incorporated into the Xeon 5500 processors, often raise issues of validation and test, not just for the chips themselves but also for the circuit boards where they are deployed. Test and measurement methods and technologies have evolved to account for these issues. The newer, more advanced methods based on embedded instrumentation are able to overcome the limitations of older legacy intrusive test technologies. ASSET’s ScanWorks platform for embedded instruments, in particular, features the tools and supports the validation and test technologies for leading-edge circuit boards that feature the 5500 processors.
Non-Intrusive Board Test Strategies for the Intel Xeon Prodcessor 5500 Series
iNEMI (International Electronics Manufacturing Initiative) Functional Test Coverage
Assessment Project: http://www.inemi.org/cms/projects/test/FT_assess.html
Intel Debug Port Design Guide for UP/DP Systems, June 2006:
ASSET PCT Design For Test:
About the author:
Alan Sguigna is Vice President of Marketing and Sales at ASSET InterTech Inc.