![]() Accelerator speeds HW/SW co-verificationBy Michael RaamAs Chameleon Systems faced the challenge of delivering a highly complex communications processor with strong time-to-market pressures, software/hardware co-verification became an imperative for our product development plan. But software/hardware co-verification may not be appropriate for a project, and its benefits may not be fully realized, unless one makes carefully planned preparations. One such preparation is adequate simulation speed and for this we turned to the Xcite-1000 accelerator from Axis Systems Inc. (Sunnyvale, Calif.), which boosted our simulation performance by more than 400 times, making software/hardware co-verification a practical reality. Chameleon is a provider of reconfigurable communications platforms. These platforms must deliver exceptional processing performance for CPU, DSP, packet, protocol, and real-time processing, as well as for a variety of 3G and 4G (third-generation and fourth-generation) wireless and Voice over Internet Protocol (VoIP) algorithms. Often, the digital-signal processing requirements associated with these applications exceeds tens of billions of operations per second to satisfy the stringent frame latency requirements of the communications standards. The development time for a communications processor comprises two equally time-consuming processes: software development and hardware design. Typically, designers cannot initiate software verification until a hardware prototype is available, although software development and hardware implementation usually take place in parallel. Unfortunately, when software verification occurs in a serial manner, software/hardware interaction problems may not be detected until late in the design stage.To move software verification earlier in the design cycle, designers can verify software execution at the simulation stage by linking software development tools within the hardware simulation environment the concept of software/hardware co-verification. By running software code on a simulated hardware model, one can verify system interaction between software and hardware before chip tapeout. In addition, software/hardware co-verification reduces the need to write and self-check the results of elaborate testbenches that generate stimuli to the hardware design. This is a substantial benefit, since testbench generation can absorb most of the verification time with little reuse from previous projects. Although the concept of software/hardware co-verification has been around for some time, its practical usage has been hampered by HDL simulation speeds. For software simulators running on the fastest workstation, the average throughput ranges from 30 to 50 instructions per second. For software programs comprising tens of millions of instructions, each software execution on a software-simulated model may take weeks to complete. In addition to the current speed barrier, hardware complexity is increasing at a rate two times faster than software performance improvements. Thus within two years, software execution will average 15 to 25 instructions per second due to the increased complexity of chips. To realize the benefits of co-verification, the system must have sufficient software content and heavy hardware interaction during execution. Second, both sides of the engineering team must agree on using co-simulation early in the design stage. This agreement guarantees smooth methodology flow and a common communication medium between the two teams. Third, processor models, software development tools, and hardware design models are all essential. Since software and hardware teams operate with different tools, they must be able to interlink without changing any design methodology. A successful implementation of software/hardware co-verification allows independent teams to develop their code within the engineering specification with an eye toward verifying all separate components together according to specification. During co-verification, design debugging proceeds at a faster rate as compared with software running on a prototype, because all internal node states are available for viewing and all simulation history can be captured. Thus, software/hardware co-verification captures design errors early in the design cycle and speeds up debugging time to isolate design problems with proper fixes. These benefits translate to shorter design times and faster time-to-market. Chameleon's Reconfigurable Communication Processor (RCP) integrates tens of millions of transistors, the vast majority of which were created as a custom circuit design. Facing this high degree of complexity and substantial time-to-market pressures, the designers at Chameleon were forced to take innovative initiatives.
On the design side, we developed a proprietary custom datapath generator through complex RTL-driven Perl scripts to compact layout elements automatically. We also knew that the verification effort would be difficult. Besides the typical problems associated with a design of this size and complexity, our verification problems were further exacerbated by the flexibility of our custom "reconfiguration fabric." The fabric provides instantaneous reconfigurability of algorithms, and as such, required verification of near infinite numbers of configurations. We used conventional software simulation during the development. However, the low performance of software simulation would not allow exhaustive verification of the design in a timely manner. We had internally set our verification coverage goals at 100 percent statement and at least 95 percent branch. To meet these coverage numbers, we developed a variety of verification methodologies including directed and random tests. These tests generated tens of millions of cycles of regression. Software simulation of a few cycles per second would have resulted in more than 10 days of verification per run, a luxury not available to us. We needed a verification environment that would run in a few hours so we could turn designs over quickly and fix bugs. Furthermore, we required software and application development in parallel to the hardware development at speeds reasonable to allow simulating 3G wireless algorithms, for instance, running in the hundreds of millions of cycles. Looking for solutions We investigated a variety of solutions in our search of a high-speed simulation environment, including cycle-based simulation, emulators and simulation accelerators. Emulation platforms, while providing very high performance, require a few months of ramp-up effort at prices that were steep even for cash-rich startups. Also, debug on these platforms is not straightforward. One company, Axis Systems, had introduced an innovative simulation accelerator that proved to be the right compromise of speed, bring-up time, debug capability and ease of use. As a result, we reduced our software execution from days to hours. This speedup enabled our software group to verify their code before the chip taped out and presented opportunities to correct chip design functionality. Chameleon Systems acquired the Axis Xcite-1000 system, comprising 24 boards each populated with 10 field programmable gate arrays (FPGAs). The system is further equipped with a software simulator to run non-synthesizable testbenches and system models. Thus, while 99 percent of the design was downloaded onto the FPGAs, a very small portion of the design, including testbenches, ran on the software simulator simulating the entire system, not just the chip. We enhanced this platform by adding a programming language interface (PLI) and debugger. The PLI allowed system software written in C to interface with the Axis box simulating the RTL and the system around it, while the debugger enabled easy debug for the software developer. The Axis system in our environment gave us an effective 1,000 cycles/second of performance, well above the software-only simulation speeds. With this kind of speed, we were able to run all our verification in a matter of hours, find bugs, fix RTL and be ready for the next verification run the next day. In our design verification flow, there are three streams of simulation-the Axis RCC simulator, Axis software, and the reference model. In the case of Axis RCC and software simulation, the results are compared to the reference model for pass/fail decision making. The design in its original form, either RTL or gate-level, is compiled to generate an Axis database in preparation for download. In parallel to this activity, the image of all the regression testing is downloaded onto the Axis platform in preparation for execution. While it takes hours to recompile a new RTL onto the Axis system, the hundreds of times higher simulation performance more than makes up for the compilation time. With an incremental compilation feature recently introduced by Axis Systems, it is now possible to reload only the FPGAs affected by the RTL bug fix, in most cases under an hour. We achieved the 1,000-cyles/second performance by rewriting much of the testbenches as synthesizable RTL to facilitate downloading them onto the FPGAs, as opposed to running the testbenches on the software simulator. The bottleneck is the software simulator-the more you load onto the FPGAs, the higher the overall simulation performance. It is worthwhile noting that our large design is also very interconnect-intensive, a feature endemic to reconfigurable platforms. Nonetheless, we brought the design up on the Axis system in only two weeks. The system also allows standard waveform front ends to be used to view the simulation results. This enables capturing the waveforms at high speeds and dumping VCD files for viewing, leaving the box available to others for use. In a sense, working with the Axis system is like working with standard Verilog simulators, except at much higher speeds. An attractive feature of the Axis system is the availability of many Megabytes of real memory on the boards. These memories can emulate not only SRAMs in the design, but with proper wrappers they can look like DRAMs emulating on-board memory. These enabling features are essential for a complete system-level verification.Looking for enhancements In the future we intend to make heavy use of the incremental compilation feature that Axis recently added. This will enable us to simulate, debug, fix and recompile the design for download on the Axis system all in under an hour. We anticipate Axis will further improve future products by providing higher capacity and higher simulation speeds. Higher capacity will ease the concern about FPGA utilization and improve mapping time. It will also provide an upgrade path for design changes that results in higher gate counts. Higher speed is obviously a welcome side-effect of using higher performance FPGAs. Currently, some of our applications require billions of cycles to properly simulate at the system level. At 1,000 cycles/second, it will require at least three days to simulate the systems. Another item we look for from Axis is a suite of libraries to make the setting-up effort easier. These libraries would provide wrappers around on-board SRAMs to emulate a variety of DRAMs, ROMs, and other components. Currently, we must spend time writing these wrappers ourselves. Chameleon's RCP is, by every measure, a system on a chip. The only true way to verify the software prior to tapeout was to run it on the hardware simulator. We were able to reduce our design verification time by 60 percent via software/hardware co-verification using simulation acceleration. As a result, we made our tapeout schedule with the confidence that the hardware and software would perform to our expectations. Michael Raam is Vice President of Hardware Engineering at Chameleon Systems Inc. (Sunnyvale, Calif.). Back to TechTrends
|
| ||||||||||||||||