Design Article
Tell us What You Think
We want to know what you thought about this Design. Let us know by adding a comment.
Tackling large-scale SoC and FPGA prototyping debug challenges
Brad Quinton, Tektronix
1/21/2013 11:06 AM EST
With the preceding analysis in mind, Tektronix set about to address the need for improved FPGA prototyping tools with the formation of an embedded instrumentation group. The goal was to bring full RTL-level visibility to FPGA-based debugging. That goal has now been accomplished with the release of Certus 2.0.
Using only software and RTL-based embedded instruments, this debug platform uses a highly efficient multi-stage concentrator that serves as the basis for an observation network. This reduces the number of LUTs required per signal to increase the number of signals Certus can probe in a given space. This architecture, coupled with advanced routing algorithms, keep the size of the concentrator to a minimum and enables Certus to provide effectively the flexibility of a full crossbar mux while requiring no more die area than a standard simple mux. In practical terms, Certus enables engineers to instrument tens of thousands of signals using fewer FPGA resources than what a standard FPGA-based debug tool requires to instrument 1,024 signals.
The focus on RTL-level signals creates even more efficiency. With an RTL-based design, many signals are equivalent. Consider a flip flop that drives an I/O. If one directly observes the state of the flip flop, one can also infer the state of the I/O. For most circuits, there are between 3 and 5 inferable signals for each signal that is directly observed. All of these relationships are automatically calculated and made visible through the Certus analysis tools. Thus, if a system is instrumented for 30K signals, engineers will be able to observe the equivalent of ~100K signals without any need to evaluate the low-level details of the design.
In addition, Tektronix has simplified instrumentation by providing access to multiple probes with a single selection. For example, Certus provides the ability to select all flip flops, all interfaces, all inputs or outputs, and all state registers without needing to recompile the system. Typically, however, engineers would instrument all interfaces and registers to ensure access to all of the key signals in a design.
The ability to instrument every relevant signal and then view any combination of signals breaks through one of the most critical prototyping bottlenecks. As noted earlier, recompile time is a major consideration in determining how quickly a bug can be identified and resolved. More comprehensive signal access minimizes the impact of recompiling on the debugging process as show in Figure 2. Instead of multiple hours between debug sessions, engineers can select a whole new set of signals to monitor in a matter of minutes by reconfiguring the infrastructure over the JTAG port.

Figure 2. Since the need to run synthesis/place and route is not required to access new signals, time consuming “go-home” events are eliminated from the debug cycle.
Improving capture depth
Once the signals are instrumented, the focus turns to capture depth. To extend the size of capture buffers, some debugging tools utilize off-chip memory or high-speed cables. The challenge is that this creates a bandwidth bottleneck since time division multiplexing techniques are required to offload data. In turn, this restricts maximum capture frequency to <20 MHz and/or number of probed signals to a few thousand. This in turn restricts the system under test to be clocked at a lower frequency.
An alternative approach, as implemented in Certus, is to use compression technology to improve the information density of captured data and minimize the number of block RAM dedicated to debug. FPGA-based debug tools consume an entry in block RAM for every signal over every clock cycle, quickly consuming limited block RAM. As it turns out, however, many signals, exhibit reoccurring patterns that can be greatly compressed without any loss in signal integrity.
Since no single compression algorithm delivers optimum results by itself, Certus instead uses a compression cocktail. It dynamically uses a variety of compression and data packing algorithms to minimize the block RAM needed to accurately capture a signal. The effects of this compression are dramatic. Trace depth is typically extended by 100X or more. The effectiveness of compression depends upon the signal and can even exceed millions of cycles per block RAM (see Table 1).

Table 1. Due to the fact that many signals exhibit reoccurring patterns, compression techniques can significantly improve capture depth.
Next: Multiple domain analysis

