United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 

Hardware and Software Co-development: A Software Engineer's Perspective

Enjoying design should be a rule, not the exception. Software engineers should remember as much when entering hardware environments.

By Andy Foote


Traditionally, many embedded software engineers have understood hardware more as an interface than as a major subsystem unto itself. We learned how to use oscilloscopes and look at a few wave forms, but nothing we did approached the complexity and diversity of a tightly integrated system-on-a-chip (SOC) co-development project, where the division between software and hardware isnýt so clear cut.

Then, a few years ago, SOC designs and HW/SW co-developed projects emerged as a new, cutting edge technology that promised cost-cutting and space-saving opportunities. Suddenly, a new approach emerged in which software and hardware could be co-designed and co-verified in a completely virtual environment, raising the possibility of creating a verified device complete with software drivers on the first spin.

From a business perspective this is obviously great news, but it also means the software engineer has a number of brand new challenges. No longer is there a distinct, immutable hardware or software interface. Instead, the software engineer finds him or herself working on a single complex system, with different components representing different levels of abstraction.

So without a prototype board to plug into the back of a PC, how does one download code? Where do you put your boot code? How does one debug, when the failure seems to reside somewhere between dozens of layers of software and hardware components? And after a fix,can you prove you didn't break something else? In a two-million gate system, will a test even finish running in simulation before your children get through school? Does a software engineer really need to understand how a direct memory access (DMA) engine works internally?

At Intrinsix, we have developed a number of tools, processes, and techniques to facilitate debug sessions, run a virtual system modeling both hardware and software, and give us visibility into all aspects of a software/hardware SOC. (The term "hardware" used here refers to hardware as it is represented in the virtual co-development environment. Real physical hardware will be described as such.)

Feedback and debug: bus monitors

The first thing any embedded systems engineer needs is some type of feedback to indicate events in the system, or lack thereof. To that end, we have created a number of bus monitors that log output. These are hardware modules written in both VHDL and Verilog, which act as bus snoopers. There's a pin attached to each bus line and each clock will cause the bus monitor to sample each pin. Note that this pin connection isn't replicated in the real hardware; this is a connection to the virtual environment only. Different monitors can be configured to act upon special values, all values, or none at all.

A bus monitor is typically attached to each bus in any of our verification systems. Each transaction on that bus can then be logged. The log messages can be turned on by simple inclusion of a "include file in the testbench, or turned off by the lack of that include." In addition, we have a message-logging task triggered by data written to "magic" addresses, each magic address having a special meaning. The figure below shows an example of a simulator display window before and after bus monitor implementation. The feedback generated by the monitors provides an invaluable debugging tool. Imagine you have an arbiter that arbitrates access requests from four different busses, each requesting an external memory access. Monitors show in readable form what could otherwise take days to figure out. When regressing a test suite, a log, which includes bus monitor messages also, provides quick visual feedback as to whether or not a change broke existing code. The listing (see Listing 1) shows part of a typical sequence in which a DMA engine transfers data from external memory to internal RAM (see Figure 1).

Listing 1: Typical bus monitor output

=> 430000 CORE Log progress
monitor message = 1311

=> 430800 DARAM A5 Write:
Addr: 1060 Data: 1311

=> 433100 PERI bus Read SECTION A:
Addr: 0054 Data: 0000

=> 436100 PERI bus Write SECTION A:
Addr: 0054 Data: 0014

=> 438400 DMA External Program Read
Request Addr = 8000

=> 438400 DMA External Acknowledge
from EXTMEM ARB

=> 446400 Read from external mem:
Addr: 8000 Data: aa55

=> 454400 DMA Internal Data Write
Request to Addr = 0804

=> 454400 SRAM S3 Write Addr: 0804
Data: 0000

=> 454400 ERROR Miscompare at addr 0804,
expected = 0xaa55, actual = 0000

Performance: system run time

Though quite slow, an instruction set simulation (ISS) running a C or an HDL core model of the target processor is generally considered accurate. It isn't unusual to have a core that executes between two and ten instructions per second. A test that takes two to three minutes to complete running with a BFM can take 20 minutes to run with an ISS. It is our experience that this is a fairly short test relative to some of the complexities necessary to properly test an ASIC. Further more, an ASIC can often have several hundred tests that must be written, debugged, and saved as part of a permanent test suite to regress on a regular basis.

Figure 1 - Simplified co-verification environment
Driver code interfacing to the BFM using HAL and smart pointers. The BFM routes SPI driver accesses to the SPI.
That's one compelling reason to write a bus functional model (BFM) of the core. A BFM is a simplified model of the core intended to interface to softwarewritten in a higher level language (we use C/C++) that executes orders of a magnitude faster than the true core IP. A BFM has the same bus interface as the core and the software that accesses the BFM has no real knowledge of the BFM's implementation. No boot code is required to configure the BFM. It will just handle register and memory accesses, interrupts, and special instructions, as needed-instructions such as IDLE and NOP.

The BFM is connected to test software by the hardware abstraction layer (HAL). One of our hardware engineers can typically write and debug a BFM in approximately 4-10 days. We have a few BFMs in our library, which are used as baseline for others. Then, software can be written and debugged using the BFM, and as cycle accurate hardware modules are completed, they can be attached to the BFM. When the software has been debugged, one or more hardware models can replace the BFM. Additionally, because of the HAL interface, the software needn't change to accommodate the transition.

Co-development project phases

A co-development project typically has discrete phases quite similar to a more traditional embedded project, but those project phases overlap in a parallel manner rather than being serial. The basic tasks are the following: design and implementation of hardware modules; design and implementation of application code, software drivers, and boot code required by the hardware; implementation of unit and system test-code; implementation of debug and framework tools. These tasks also include debugging and integration of components from the very basic integration of two or three components, to the point at which the whole product is being exercised. Once the system has passed verification, it will be burned into silicon and re-verified. Hopefully the results will be identical to those obtained on the virtual system.

Two techniques can be used to help optimize co-development projects. One increases the number of tasks that can be executed concurrently. Another increases the portability of software components written such that they can be used in as many phases of a project as possible. Portability from project to project is also desirable.

For parallelism to occur, one phase must include bith hardware and software design and implementation. Software engineers can be writing drivers and test software while the hardware engineers are implementing their modules. Note that an accurate set of interface specs is essential. By using a BFM with stubbed-out peripherals wherever needed, software engineers can debug test code and driver code as the hardware becomes ready. Subsequently, hardware modules can be added and replaced as they are completed. The software engineer needn't know when hardware replaces stubs, except to understand that accurate timing will replace the default timing present in hardware stubs.

Software portability

One of the major advantages to co-development: if the software is designed correctly, drivers need be written only once and simply recompiled from phase to phase of a project. To that end, we have created a Hardware Interface Layer (HAL) to provide an interface between the software and hardware, thus creating a hardware abstraction, which allows a consistent interface to software. The HAL consists of a base set of a few C++ classes and a few C functions to perform interrupt handling, IDLE and NOP instructions, as well as other specialized instructions required by a particular processor.

Drivers and test code are written using the HAL interface. Therefore, the software engineer can perform a vast majority of the debugging using a BFM. When the hardware is sufficiently debugged, it can be synthesized, permitting the BFM to be replaced by a cycle-accurate core model of the processor in HDL. The software simply recompiles with the target tool chain. That target executable code is tested in a simulation of the core model. Once that verification is complete, the software can be run on real hardware without further porting changes.

When using a BFM, the HAL decouples driver software from hardware via a communications interface over a standard Berkley socket interface. This means that the simulation runs as a separate process and can be run on a physically different machine than the driver, should that become advantageous for performance reasons.

A high percentage of a device driver is register access. In the old days, most of us wrote or downloaded a header file containing all the addresses of the registers in the system and used C. But for a BFM-based system, that technique won't work due to the socket interface that decouples the BFM from the driver. For each read and/or write transaction, a message must be exchanged between the driver and the BFM.

In an admittedly unscientific study of a few projects at Intrinsix, we found that an average of 30% of the driver code is of the register-access type. Preferably we would create normal C pointers for register-access useable code by either C or C++, and not have to change them between project phases. Therefore it wouldn't matter whether we are using a BFM, a core model, or real silicon.

To that end, we have implemented two classes that interact to form a construct known as a smart pointer. The idea came from Scott Meyer's smart pointer class (first written years ago), which was similar to the auto_ptr class eventually incorporated into the C++ STL. The smart pointer classes use the same syntax as the C pointers. Smart pointers use templates to size themselves according to the data width and address-bus width. They encapsulate socketcommunications for read and write accesses, and thuse can be used with either a BFM or a core model. The driver writer needn't program especially for either one, and, by simply recompiling for the target processor with a flag on the command line, a smart pointer will convert to a regular C pointer, useable with a core model and/or real hardware.

The other HAL calls are analogous to the smart pointer in that they insulate the driver code from the hardware. These calls are C functions, which provide a generic, abstracted interface to the hardware for non-register access operations typical of driver code requirements. IDLE and NOP are two common functions. Additionally, interrupt enable, disable, register ISR, and priority modification functions are also provided. A simple recompile will cause any changes needed when transitioning from BFM to core-model to silicon without any changes required to the driver.

In the listing (see Listing 2), PORTA and PORTB are smart pointers that map to write_16_bit (data) type calls, which in turn send a write request over a socket to the BFM. When running on target or with a core model, PORTA and PORTB will change to normal dereferenced C pointers via a recompile.

Listing 2: Sample test driver C code

void main () {
-

PORTA = 0x5555;
val = PORTB;
if ( val == 0x000a) {
register_interrupt (funcname);
enable_interrupts();

}
-
}

Reusable testbench components

In the traditional development environment for ASIC design, custom testbenches are written for each module. Also in software engineering, customized test programs are usually written to run tests on a board as needed. Instead of the one-off, throwaway approach, another source of time savings is to develop standard testbench components within a standard verification architecture, all of which can be used from phase to phase within a project and also across projects.

At Intrinsix, we have a number of transactors for typical hardware modules as well as a verification framework to tie them together with product components. A transactor is a component written in a combination of HDL and C for the purpose of interfacing to the device under test (DUT). It attaches to the pins of the DUT and has the capability to provide stimulus for the device, verify the device output, monitor events, and/or synchronize to any other component of the system. Transactors are reusable for both module-level and system-level testing. Such transactors never create hardware and, of course, never run on real hardware.

Booting and downloading

In the good old days, software engineers wrote boot code to configure the chip and burned that code into a ROM or equivalent. The ROM was then plugged into a socket on the board. Furthermore, the board was plugged in to the back of the engineer's PC; applications were downloaded from a host through a serial port into RAM and/or flash. But without the hardware, how is the boot code accessed and how is it run on an IP core? What does download really mean in a virtual system where all of the executable code runs on the same CPU and often from the same disk under the same OS?

In theory, an IP core behaves identically to a real core and, similarly, the simulated peripherals should behave as they do in real hardware. However, the software engineer must still write code to perform all of these same boot and download functions. When implementing these mechanisms, there do exist some choices that are driven by a combination of design and performance goals. But, in general, most implementations are fairly similar.

To start with, you'll need boot code (normally written in assembler) that will run on the hardware once it has been manufactured. In simulation, that code is read into a model of the ROM that will exist in hardware on the final product. After writing the appropriate boot code to configure the chip, use the tool chain for that core to compile, assemble, and link that code into an executable. From that executable format it must then be converted to programmation file format readable by the Verilog system call $readmemh() or $readmemb().The $readmemh() call may be preferable because the format required for that call is in ASCII hex format, which is more readable. Some tool chains may include a conversion program to perform this step; of necessity, we have our own (non-proprietary) program to do it. Although VHDL doesn't have the equivalent to the $readmemh() function built in, it's easy to write one.

ROM model warnings

The ROM model must be written so as to use the $readmemh() or $readmemh() (or equivalent) system calls to read in that programmation file when the simulation initializes, before the core comes out of reset. When the core does come out of reset, the boot code instructions will execute from the start address in ROM.

There are a few caveats to be aware of here. One is that the reset vector has one or two specific locations. Those locations must jump to the beginning of the boot ROM. Just as is the case in real hardware, it's the hardware designer's job to ensure that the ROM is correctly located. Thus, the map used by the linker/locator program must also be correct, even for virtual hardware. However, the ROM model itself typically has no knowledge of its location relative to other memories and peripherals. So if you have occasion to examine the programmation file, note that the first instruction there is always at address zero relative to the ROM model only and not to any other memory.

Now that your boot code is downloaded, what if it doesn't work? In the past, this was a challenging situation with the board plugged into the back of a PC; frequently, the software engineer used a logic analyzer or emulator, and/or a few LEDs on the board. The virtual environment offers greater visibility into the behavior of boot code in a couple of ways. One is through the use of the aforementioned bus monitors. Simply include these in your HDL code and a record of every bus transaction will be printed to the screen. That includes the program counter, so at minimum you can see which instruction is being executed at any given clock cycle.

And, for those software types that are more inclined to deal with the internals of the hardware, you can, of course, use the wave form viewer of your favorite simulator to look at other pins and registers on any peripherals you want. (A waveform viewer is a tool that serves as a very advanced logic analyzer with virtually unlimited probes, endless memory, and printout capability.)

At some point after the initial boot code works, you will undoubtedly need the ability to download to RAM. As with boot code, a downloader should be written just as if to run on real hardware. In a virtual system, the downloader can be included as part of the boot ROM.

When invoked, that code interfaces to the appropriate peripheral, say a serial port. The catch is, you need something on the other side of that serial port to feed it data to replace the host that used to be there in the real world. At Intrinsix, we have standard components specifically intended to be part of our testing framework that serve this purpose. So if the product requires serial port, JTAG, or BDM, we have components to fill that role.

Specifically, let's take the case of downloading through a serial port. We attach a serial port transactor to the serial port module during the hardware build. The test code to be downloaded must be built into an executable that uses the instruction set of the processor used in the target system. After the chip is booted, the download driver code will issue a receive data request to the serial port module. That in turn will request data from the outside world, which in this case is a serial port transactor. The transactor will read from the test file executable and transfer data in a size appropriate to this serial port module. In that way, when the transactor is removed, the system can be run correctly in real hardware. If the verification phase has been complete, the entire operation should work correctly the first time it's delivered in real hardware.

A paean to automated regression testing

This is another area in which co-development is a giant step forward for software engineers used to trying to verify their code on a board plugged in to a PC or Unix Workstation. It just can't be verified as completely in a cost-effective manner.

Automated regression testing is a simple concept to implement that easily pays for itself many times over. It frequently seems easier to run regressions manually as needed than to stop and spend the time to write scripts to regress and automatically report results with all the relevant information. We have found it unwise to succumb to this temptation.

As a project develops tests for a particular ASIC, they should be added to a suite of tests saved in a repository under source code control and regressed virtually every night. Once the test suite starts to approach a few hundred tests, running automated regressions becomes useful. It's the best way to ensure that a change that breaks a working component will be flagged almost immediately, and when a component needs to be fixed, that flag doesn't slip through the cracks and get forgotten. It's this author's opinion that there is no such thing as a project that doesn't break working components during design and during verification. Please email me if I'm wrong.

At Intrinsix, we have a standard directory structure used across projects and a baseline set of scripts that regress an existing test suite and report the results. We typically customize these scripts for each project; so far the changes haven't been too costly.

The procedure is straightforward; each night a script is run that will analyze each test directory and check its dependencies. That test will be rebuilt if needed and then run, the results are printed to a log file. When the regression is complete, a script is run that reads through all log files and writes a summary report. The team member responsible for the regression then passes out a summary report to each team member every morning.

To assist in formatting a summary, logging functions are centralized, and display messages are formatted consistently from project to project. Message types within test generated log files are tagged so that they can be easily found by report generation scripts. For example, every test ends with a summary of pass or fail conditions on that test. Using bus monitors as part of the standard framework helps to ensure this

Tying it all together

There are a number of components that make up a co-development environment. Intuitively, most software engineers can grasp the idea behind each component quite quickly. But don't be fooled by the apparent simplicity, an ASIC co-development process is still complex. Each project is still different and each project requires a lot of work.

To support a good co-development environment there must be a concise framework architected to support flexibility and extensibility. All hardware and software modules in a co-development project should be written and executed within that framework. More reusable components will reduce cost and time tomarket, but they must fit the guiding philosophy behind the framework. That frame work includes communications code, hardware synchronization, testbench startup, shutdown, and coordination; all in addition to the components described in this article such as the HAL, the BFM, and the regression scripts. We continue to automate wherever possible, and because eachproject uses the framework in slightly different ways,we continue to learn as we progress.

Co-development is a necessary step forward for the industry, and most software engineers will find that with some guidance and experience, they will actually come to enjoy the virtual environment. Better verification techniques and more insight into the hardware will ultimately make us better software engineers, and we'll produce better products.


Andy Foote is a principal software engineer at Intrinsix Corp. in Rochester, MA, where he is responsible for research and application of hardware/software co-verification technologies. Previously, he was a senior software engineer at Johnson and Johnson Clinical Diagnostics.

The author would like to thank Bob Morasse, Fred Rakvica, Peter Spyra, and John Szybist for contributing to this article.

To voice an opinion on this or any other article in Integrated System Design , please e-mail your comments to mikem@isdmag.com.


Send electronic versions of press releases to news@isdmag.com
For more information about isdmag.com e-mail webmaster@isdmag.com
Comments on our editorial are welcome.
Copyright © 2000 Integrated System Design Magazine

  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About