Asymmetric digital subscriber line technology, delivering more than 100 times the performance of today's analog modems, is revolutionizing the market for remote access devices. To meet the industry demands for the highest densities in the central office, silicon requirements are increasing to provide cost-effective, flexible and somewhat customizable solutions for manufacturers.
But with a higher level of chip capability usually comes a commensurately higher level of chip complexity and as complexity grows, so does the time needed for effective verification. Verification cycles of up to one year and full-chip verification cycles of two to three months are not uncommon. For our latest central office chip set, the AC5, an octal high-density ADSL infrastructure solution, our engineers had to run effective simulations as quickly as possible, catching bugs and dealing with them while also meeting time-to-market expectations.
The first step was determining low-maintenance verification tools that would integrate easily into our existing flows, provide cost-effective acceleration and require minimal support. And, we wanted a solution that would not require a large investment of money or time.
We modeled the DSP core using VHDL, wrote the ADSL block in Verilog and verified it with a combination of behavioral and RTL Verilog testbenches, as well as C reference models. We then compared the contents of internal memories to the C model to gauge results.
The DSP subsystem included the C6000 DSP core, which we call the Megamodule, as well as peripheral logic. The peripheral logic was available as a VHDL RTL netlist unusable for our flows since we're Verilog-based and as a gate level netlist, in Verilog format, using a standard cell library. The DSP core was available as a C model ready to plug into a VHDL environment again, unusable for our flows or as a gate level netlist (in Verilog format), using a custom cell library. This DSP subsystem had already been verified, but we still needed to test the interface and model the library cells.
The ARM7 core is imported external intellectual property (a hard macro) for which we had a Verilog module that contained a PLI interface to the ARM7 C model. At the module level, the design team established separate testbenches for the various sub-blocks: the digital interface, the DSP accelerator and the analog front end. In the digital interface testbench, we called the "Golden C" model via PLI functions, with on-the-fly comparisons between C and Verilog. For the DSP accelerator testbench, we called C before simulation started, wrote data to files during simulation and ran comparisons between C and Verilog data at the end of the simulation.
The analog front-end testbench used Chronology bus functional models, testbench modules that simulate external stimuli and receive data from the chip, which were used to simulate correct behavior of the interface protocol inside the chip. In some other cases, we created our own bus functional models that are "smart" enough to send data into the chip, receive the data later and then compare the received data against what was sent earlier. This capability was used for loopback testing, mainly in the Utopia/serial test cases.
External interface verification evaluated the ARM subsystem and the DSL subsystem using bus functional models of all external interfaces.
For design and verification, we had to verify the imported IP (C6000 DSP core and ARM7 core), the external interfaces, module level functionality, the connection between modules and full chip integration. To meet testing requirements, we estimated that it would take about three weeks to run all of the 43 test cases of more than 70 million clock cycles, at the module level, using a software simulator running at approximately 40 cycles per second. For system-level tests, we estimated that a test suite approaching 1,000 test cases and 285 million clock cycles would take more than 164 days to run using a software simulator running at around 20 cycles/s. And running gate-level simulations with full timing would be even slower. That was insufficient for meeting our product development cycle.
To expedite the design flow, the team used a simulation acceleration system called Xcite 2000 from Axis Systems. It is based on reconfigurable computing (RCC) technology, enabling swapping between accelerated simulation and software simulation. Xcite satisfied the first of our requirements, which was that it fit easily into the existing design flow, enabling the new system to be brought up in less than one week.
We began by using Xsim, a Verilog-XL-compatible software simulator tightly linked with Xcite. The transition from XL to Xsim was almost trivial. The design for simulation acceleration was nearly problem-free, resulting in a very fast bring-up. Since the simulation acceleration design was very similar to a previous design we had given Axis for benchmarking, all potential problems were already identified and could be easily fixed.
Using Xcite, we mapped our RTL into the RCC hardware. To further speed up simulation, the design and verification team developed synthesizable testbenches, which were easily mapped into RCC. To increase functionality, the Verilog-XL environment was augmented with Xsim, a native compiled software-based simulator.
Next, the team compiled shared FPGA libraries one for each submodule, to be tested separately and two shared libraries for the top level. These FPGAs are used to implement the RCCs. The Axis tools allow users to compile as either a shared library or as a separate compile. In some setups, users run both the Xcite compilation that maps the RTL design to RCC elements and the FPGA compilation every time a simulation is run. Since in our environment that would require a lot of compilation, we decided to use the separate compile technique, running the FPGA compiles once each night. During the daytime simulation runs, users then only compile the testbench and reference the precompiled FPGAs.
Before upgrading our verification capabilities, run simulations would take hours or days. With the new ones, we were able to get simulation results within a few hours. The next night the process would start over.
For regression testing, we used two methods. In cases where the testbench didn't change, the RTL was compiled once using the shared library and then tested against hundreds of different parameters. This eliminated the need to compile the design multiple times and allowed us to complete simulation runs quickly. In cases where the testbench did change, a process was set up for compiling simulation jobs that were put into the RCC acceleration queue as soon as each compile was completed. Simulations would run and compile in parallel as much as possible in order to get the maximum run-time out of the RCC hardware, bringing our RCC utilization close to 100 percent (that is, there are always jobs in the queue waiting to run).
With new acceleration capabilities in place, the 43 module-level tests that would have run at 40 cycles/s were now running at 3,500 cycles/s, taking only 5.5 hours to run the full 70.4 million clock cycles. At the system level, Xcite boosted the running of 758 test cases (285 million clock cycles) by 35 times, reducing the possible 164 day simulation to just four days and 17 hours.