Design Article
Hardware/Software Co-Verification with RTOS Application Code
Michael Bradley and Kainian Xie
9/9/2002 12:00 AM EDT
|
ABOUT THE AUTHORS
Michael Bradley is a seamless technical marketing engineer for Mentor Graphics. He has background in using RTOS's, and has supported and developed co-simulation tools between various hardware simulators and accelerators. He received a B.S.E.E. from Rensselaer Polytechnic Institute.
Kainian Xie is a senior software developer for HyperChip in Montreal, QC. He received the B.S. and Ph.D. degrees from Xi'an Jiaotong University (Shaanxi, China) in 1992 and 1997, respectively, majoring in automatic control. |
||
Hardware/Software Co-Verification is typically performed at a low level of abstraction, using an Instruction Set Simulation (ISS) model of a CPU in conjunction with a Verilog or VHDL model of the rest of the design. This article describes a higher level of software abstraction. The CPU subsystem will be replaced by an RTOS simulator and application code written to the Application Programmers Interface (API) of the RTOS. Verilog or VHDL is still used to model the rest of the design.
One solution to accurate hardware/software verification is to use the ISS of the target CPU and "connect" it to the hardware simulator that the hardware design group uses. One obvious disadvantage of this technique is that software execution is limited to the speed of the hardware simulator. The Seamless Co-Verification package from Mentor Graphics increases the speed of the ISS-Hardware Simulator connection by allowing most of the ISS instruction cycles to run decoupled from the hardware simulator. This technology, termed optimizations, has been used to generate successful Silicon-on-a-Chip (SoC) tape-outs, as well as CPU-based board designs.
Another tool available to the programmer is an RTOS simulator, which does not emulate the instruction set of a CPU but, instead, models the resources of the RTOS itself. This allows the programmer to develop and debug task-level operations such as pending and posting to a mutex, rescheduling of tasks, and mailbox operations. The RTOS simulator is a higher level of abstraction than an ISSit is CPU independent and does not require (or allow) assembly code.
It is possible to connect an RTOS simulator to the hardware simulator through the Seamless co-verification tool. At this level of abstraction, it is possible to observe the threads of execution and how these threads interact with the hardware. The effect is the appearance that thousands of software cycles have run in conjunction with the hardware in essentially zero time. In other words, the RTOS can be initialized, application tasks started, and the software ready to interact with the hardware before the hardware simulator has advanced. Once in this state, the hardware will be initialized by the RTOS application, and hardware interaction begins. The software can now perform system-level transactions with the hardware. This test environment is not concerned with CPU instructionsit will be used to exercise high-level operations in hardware and software. Test-environment performance will be bounded by the amount of hardware simulator time needed to perform a given software or testbench request.
The forwarding and traffic management engine along with support functions are implemented in several FPGAs. A CPU is connected to the datapath hardware via a PCI bus interface. The CPU runs the VxWorks RTOS from WindRiver. WindRiver also provides VxSim as an RTOS simulator for VxWorks.
In the deployed line card, VxWorks runs on the CPU's core. VxWorks' memory space is the local SDRAM to the CPU. The PCI block within the CPU acts as a bridge and allows the core to communicate with the datapath hardware. Datapath hardware is able to communicate to the RTOS by depositing traffic information to SDRAM, and sending a PCI interrupt to the CPU.
Figure 1 shows the major blocks in the line card. For the hardware/software verification environment, the hardware and software processes must communicate through some interface logic in the hardware simulator. This hardware/software interface is typically the pins of a CPU core or chip. However, in the line-card design, we are able to obtain a higher level of abstraction by interfacing at the PCI bus. To accomplish this, Seamless provides a PCI 2.1-compliant transactor model. This model converts I/O reads and writes within VxSim into PCI bus transactions in hardware. The PCI transactor also provides an interrupt facility from the hardware to VxSim.
In the simulation environment, the CPU and SDRAM are abstracted. VxSim will replace VxWorks running on the processor. VxSim is a simulated version of VxWorks, and runs on the workstation CPU. The workstation memory will replace the SDRAM. The Seamless PCI transactor model acts as the PCI bridge located in the CPU. Seamless implements the requested bus transactions from VxSim in the PCI transactor model. The PCI transactor model is instantiated in the VHDL design.
VxSim is integrated with Seamless via the HCE (Host Code Execution) mode. HCE is a special mode of Seamless that is activated when an ISS is not present. HCE mode allows the user to execute C code that references an HCE library, and is compiled for the workstation. The HCE library interfaces to a Bus Interface Model (BIM) in the hardware simulator. In other words, the HCE library allows the user's C-Code to interact with the hardware simulator. The HCE library has four major functions:
- Advances time in the hardware simulator.
- Initiates PCI bus-master transactions.
- Creates a callback to accept and/or present data when the transactor is accessed as a target.
- Creates a callback to process PCI interrupts.
The PCI library is an extension of the HCE library used to configure the PCI transactor in various PCI modes (such as, 32-bit vs. 64-bit), as well as define its configuration registers as a PCI target (such as, Vendor ID). Figure 2 shows the Seamless-enabled simulation environment.
The line-card software is designed so that the higher levels of software are independent of the underlying hardware platform or simulation environment. You can port the line-card code to different platforms by altering the hardware-abstraction layer. We created several abstraction-layer versions in order to support different environments: the CPU evaluation board, Seamless/VxSim environment, and final hardware.
The deployed hardware system will boot from FLASH, which will copy VxWorks to SDRAM, where VxWorks is initialized and started. In the Seamless/VxSim environment, the booting operation is not needed; execution begins in VxSim and the user's startup routine. The startup routine calls hardware initialization routines and starts the user's tasks. These tasks run various tests on the line card. A typical startup sequence is:
- Initialize Seamless PCI transactor.
- Search for PCI targets on the PCI bus. Configure targets as needed.
- Register PCI targets as IO devices in the VxWorks IO sub-system.
- Start tasks to run tests.
Just in case the user does not have polling or interrupt driven software, or if the user needs additional control over synchronization, the HCE library provides additional facility to control synchronization. The HCE function, hce_AdvanceHardware(), tells the hardware simulator to advance simulation. Since the line-card software is interrupt driven, it was not necessary to precisely control the hardware advance time. It is more convenient to let the hardware advance function run periodically as a VxSim background task. Accordingly, the hce_AdvanceHardware() function is put in its own task and run at VxSim's highest task priority. This task also suspends itself, in order that the other tasks may run:
int hw_ready_to_go = 0; void advanceHw() { while (1) { if (hw_ready_to_go) hce_AdvanceHardware(100); taskDelay(100); } }
Figure 3: Hardware simulator time advance
With the code in Figure 3, the hardware will advance by 100 PCI clocks and then VxSim will run for 100 ticks. The global variable hw_ready_to_go delays the start of the hardware simulator until VxSim has completed its initialization (in some of the tests, a VxSim task is run to accept user input. The user is allowed to input the test(s) he wishes to run). The hw_ready_to_go variable is set at the end of the software and hardware initialization tasks, and when a test is ready to run. You don't have to call the hce_AdvanceHardware() function during hardware initialization because Seamless will automatically advance hardware simulation time for PCI bus transactions initiated by the RTOS.
Since the RTOS is running on the host, we cannot install interrupt service routines as we normally would in the deployed system. Instead, Seamless provides an HCE callback routine that is called whenever a PCI interrupt occurs. An argument is passed to the callback that indicates the cause of the interrupt. The Interrupt type includes all possible PCI interrupt types as well as an additional type which indicates that the PCI transactor model has been accessed as a target (slave). Some skeleton code for the interrupt callback is shown in Figure 4.
In the example of Figure 4, there are software tasks polling global variables such as isrFlagA and isrFlagB. The tasks periodically sample these global variables and execute appropriate interrupt code if they are set. Alternatively, you can use a mutex to activate the interrupt handler, or code placed here directly, as in the case of INTERRUPT_TARGET.
If the interrupt is of type INTERRUPT_TARGET, this indicates that the PCI transactor is being accessed as a target. In the case of the line card, the datapath hardware would only access the PCI bus to transfer packets of data into or out of memory. First an HCE function is called to determine the type of transaction (read or write), then an additional HCE function is used to receive or send data for the appropriate address.
In this environment at Hyperchip, the Software Developer initiates the testing by creating software applications within VxSim. Tasks are written to configure hardware registers and then read the status registers to ensure the hardware is in the proper state. Next, the tasks inject data packets into the driver. The Software Developer can then analyze the Modelsim waveforms to find out if the packet is really injected into the hardware-simulation environment. After this task, the Hardware engineer can trace the packets through the datapath hardware. If everything runs correctly, and the packet did loop back and did go into software via PCI, the software Developer will make sure the return packet data is correct.
During the loop-back test, many bugs were easily caught because of the visualization of the software algorithm and the hardware implementation. Breakpoints where used on both sides to stop the simulation in order to analyze the state of the system.
Several hardware-design issues where revealed during co-verification. Since the hardware was at an RTL level, changes were easy to make. It would not have been possible to execute such a large amount of software in a typical hardware-simulation environment. Seamless allowed the software to run on the host workstation, and to periodically synchronize with the hardware simulator. For the loop-back test, the execution time of the Seamless simulation was essentially the same as the time needed for the packets to traverse through the datapath in the hardware simulator. Throughout the entire process, it was possible to control and observe results in both the hardware and software environments. This capability provided insight into line-card operation that is not possible with other tools.



