Design Article
On-Chip Design Verification with Xilinx FPGAs
Adrian Hernandez
3/31/2003 12:00 AM EST
Xilinx Virtex-II Pro devices have redefined FPGAs. The Virtex-II Pro brings with it not only a denser and faster FPGA, but an IBM PPC 405 core and up to twenty four 3.125Gb/s high speed serial transceivers. In effect, with a Virtex II Pro device it is feasible to fit an entire card design on a single chip. However this new capability carries a dark cloud because FPGA designers are concerned that there may not be enough visibility for verification and debug of large FPGAs.
Currently one option for the FPGA designer is to use HDL simulators like Mentor's ModelSim. With the simulator you can verify all your HDL, modeling everything from internal flops to an I/O pad.
Observability and controllability are terms used by the IC test field to refer to the ease of access to an internal gate. Observability is the ease of being able to view or probe the output of a gate. Controllability is the ease of being able to manipulate the inputs of a gate. For in-circuit verification these same terms can be shared with one distinction. In-circuit observability and controllability is not limited to individual gates, it can span across entire modules that can encapsulate hierarchies of many more modules.
To address the controllability and observability issues of in-circuit verification Xilinx has created ChipScope. In ChipScope there are two fundamental intellectual property (IP) cores which give real-time observability and controllability. For observability ChipScope has a logic analyzer soft core called the integrated logic analyzer (ILA). For controllability ChipScope has the soft core called the virtual input-output core (VIO). Thus with the latest ChipScope 5.2i, you can examine and drive internal FPGA nodes deep in your design, all done real-time and in-circuit.
The basic structure of the ILA is similar to that of a typical logic analyzer. It consists of two main blocks, a comparator and a storage buffer (Figure 1). The comparator detects patterns or ranges of patterns in the data path and produces a trigger mark when detected. Coupled with the comparator is a storage buffer. This buffer is used to store sampled data along with the trigger mark.
Figure 1: ILA block diagram |
For the ILA, the comparator and storage buffer can probe the same or different nodes. When the comparator and storage buffer are both connected to the same nodes, the ILA behaves much like a typical bench top logic analyzer. In this case you set the comparator to detect a pattern seen in the storage buffer data path.
When the comparator and storage buffer are connected to separate nodes, it gives you a new degree of freedom. With this feature you can have the ILA comparator attached to the address and control lines of an internal bus and have only the data bus connected to the storage buffer. In this way you can configure the comparator to search for certain addresses or ranges and store only the data bus samples.
The ILA core gives you in-circuit observability but there are some issues to consider. The first is that the ILA requires you to supply a clock. Unlike a bench top logic analyzer that can work with its own sample clock, the ILA core cannot operate with out a user supplied clock. This restriction can work against a design that has tight timing margins, since the additional clock and probe connections may reduce the design performance.
The second ILA issue to consider is the core size. The ILA unit that can consume the greatest amount of resources is the storage buffer. This is because the ILA uses internal block memory for the storage buffer block. This poses a problem for designs that already use the block memory or that require deep traces for in-circuit verification and debug. To alleviate this problem Xilinx has worked with Agilent Technologies to extend the ILA with an Agilent Trace Core (ATC).
The ILA with ATC is similar to the basic ILA with the only difference being the buffer storage is off-chip. ATC creates a channel to send internal data to pins which are then captured by a 2 million sample Agilent FPGA Trace Port Analyzer. The ATC is configurable so that more data can be sent off chip with fewer pins. It accomplishes this pin reduction task using time division multiplexing (TDM). TDM in the ATC accelerates the data close to 4 times the input clock rate. This acceleration is what enables the user to gain greater internal node visibility with fewer pins and no internal block RAM used.
The basic structure of the VIO core consists of four ports (Figure 2). Two of the ports are inputs and two of the ports are outputs. One pair of input/output ports are asynchronous, the other pair are synchronous. The asynchronous ports are intended for module ports that do not require synchronized timing, such as asynchronous resets. For module ports that are synchronized, the VIO core furnishes synchronized inputs and outputs that are clocked by the module's clock.
Figure 2: VIO block diagram |
The VIO output ports are connected to module inputs. The width of the port is configurable so that narrow and wide ports can be driven by a single VIO core. One feature difference between the synchronous and the asynchronous port is that only the synchronous has an option to create a pulse train pattern. The pulse train pattern consists of a 16 bit buffer that is attached to all the synchronous ports which allows real-time creation of patterns. These patterns are clocked out at the module clock speed to produce a testcase that spans 16 test vectors.
The VIO input ports are connected to module outputs. Like the VIO output port, the input port size is configurable. Both asynchronous and synchronous input ports have an activity indicator. This indicator works as a toggle flag that detects when edge transitions have occurred. The advantage of the synchronous port over the asynchronous is that its samples are synchronous to the module clock whereas the asynchronous will be sampled by the user interface.
One point should be made on the VIO input port use model. The VIO input port is a one shot sample of an input. The time between samples is slow since the communication mechanism is a JTAG cable. Thus the task of sampling a VIO input port can take hundreds of milliseconds and will be unusable for capture of real-time trace data. Thus for the capture of real time trace data it is recommended you use the ILA.
Figure 3: Testbench using VIO and ILA |
One choice that is available to in-circuit testbenches over simulation is whether the test data comes from real stimulus or a pattern generator. With real-time data the inputs of the design under test are all stimulated by the actual inputs to the FPGA. These inputs can be anything from high-speed serial data to an analog to digital data that is processed by your FPGA. To observe and control this test setup you use the one or more ILAs at the outputs of the main processing module and you use the VIO core to setup and enable the a testcase. By using this test setup you can now control when the real-time data is fed to the design and observe the results of the testcase.
If realtime data is not available or controllable you can use pattern vectors to create the testcase. Traditionally pattern vectors used for in-circuit testing have required filling out tables of bits that are used to stimulate the design. However with an FPGA you have the freedom to create a module inside your FPGA that generates these patterns. This means that instead of using a large pattern stored in memory to test your design you can actually hook a counter to the inputs of your design and test out all possible input combinations. Like in the real-time testcase, the ILA is used to check the outputs of your module and the VIO to setup and run the test. With this test setup you can observe and control the pattern data and run at design speed.
Xilinx's ChipScope tool team understands this problem and is focused on developing solutions for it. With ChipScope 5.2i you gain controllability through the VIO core. For deep trace capture ChipScope 5.2i provides ILA with ATC which is capable of storing up to 2 million samples with the Agilent FPGA TPA. Thus with ChipScope you can confidently verify and debug your Xilinx FPGA in-circuit and deliver your product to your customer on time.



