Design Article
Comment
dorecchio
Any approach that raises the level of abstraction for design of the new SoC ...
Automatic C-to-VHDL testbench generation shortens FPGA development time
Sunil Sahoo (Aldec) and Brian Durwood (Impulse)
4/11/2012 12:05 PM EDT
Verifying behavior early and often has become critical with FPGAs. Newer generations of FPGAs have gate counts that rival the largest custom ASICs of five years ago. This fact, coupled with the broad use of FPGA-embedded processor cores, has resulted in the increased use of FPGAs for complex, algorithmic processing logic as well as the traditional uses of glue and control logic. The use of FPGAs for complex processing can create an overall design that may combine C, VHDL and Verilog. This creates a challenge when it comes to verification, and a particular challenge to the new generation of FPGA users who are migrating over from software development to hardware acceleration.
New design C-based methods and new testbench generators enable developers to mix C and HDL. In this extended tool flow, the core application is developed in C. First level verification is performed in software simulation using Visual Studio or a comparable tool. After this functionality-only test, the developer selects a target FPGA or a target FPGA-enabled board and then verifies functionality at that level, either in-system or using a hardware simulator. The latter method saves time by reducing the iterations through FPGA place-and-route. A key productivity boost in this case is that the test files for the VHDL simulator are generated from the original C design files. There is no disconnect from design engineering to design for test. The testbench design flow is illustrated as follows:

Design standards used at one leading US government development lab say: “The testbench effort is typically connected to the design effort, in that the designer of the source code is also writing the testbench. In some programs the testbench effort is performed by different engineers but only if required by schedule and size of the design. Current [government contractor] design process standards require simulation of all FPGA designs. This is for FPGAs varying from basic control support for a RF board, to complex DSP applications for data processing.”
This lab uses a combination of C and HDL for development. Their reason for a C based approach to testbench is to help bridge the gap in the effort spent on testbench and lab test. Having testbench code in C lets them take that right into the lab and – possibly with some minor tweaks – start testing their hardware.
Design challenges
Software/hardware co-design is a new mindset for both types of engineers. Software developers have to understand, for example, that memory isn’t free and that “compile” can now mean an overnight run. Hardware engineers get to tear down the wall a little between software and hardware, and actually negotiate partition points between processing elements.

In many categories of FPGA, a low-level microprocessor core can be used to either perform low-level processing (typically multiple instantiations of the same process) or micro-control. Because multiple processors can be deployed, the wall between software and hardware really is movable. For both teams, the worst disconnect is probably the unidirectional nature of the tools. Impulse Accelerated Technologies and others have strived to correlate post synthesis results to specific lines of code, but the state of the art for software to hardware compilation is nowhere near the capabilities of software compilation. This disconnect, plus the multi-hour compile time, increases the value of step-by-step simulation and validation. Catching the mistakes pre-place-and-route can easily save a day per iteration.
First level verification is performed using an untimed C representation, focusing mostly on function and less so on clock cycles. This functional verification is mostly about finding errors in the logic of the application, rather than simulating the circuit to find cycle-by-cycle errors. The conversion of microprocessor code to FPGA code is all about refactoring the original code into multiple streaming processes that can be machine parallelized to increase throughput. Microprocessor-targeted C assumes one or two processing streams. FPGAs run at much slower clock speeds and make up the throughput rates via higher degrees of parallelism, and more flexible input and output. To exploit multiple streams, logic needs to be written such that a C-to-FPGA compiler can unroll and parallelize the code. This requires the C code to be written as much as possible as individual processes or “coarse grained” parallel logic.
Further on the coding discussion. HDL’s are often preferred for control, glue and static logic. VHDL and Verilog are both well accepted methodologies. The C-to-FPGA compiler generates synthesizable VHDL as the transfer format or API to the FPGA manufacturer’s place-and-route software. This is totally useable HDL, which means that additional VHDL and/or Verilog code can be combined in project files with this C-generated code.
Second level verification with consideration for a target FPGA. The C-to-FPGA compiler offers a pull down menu that provides almost too much selection on an FPGA target. Users can select Xilinx or Altera FPGAs; they can select them with or without the optional processing cores; and they can even select entire development boards featuring FPGAs. In the latter case the compiler manufacturer often offers a PSP or Platform Support Package. The PSP automates the low level code, which provides the developer access to the platform hardware features such as memory and I/O. At this level, verification incorporates using streams, signals, or memory to synch up the design.

The main development task at this stage is iterations. FPGA resources differ very much from microprocessor resources. Optimization isn’t magic but an iterative process of refining code and looking at how it propagates. Stage delay analysis and other ways to examine buffers and clock cycles are used to maximize parallelism and performance at this stage. More experienced developers will be able to leverage special purpose areas in the FPGA such as DSP blocks.
Third level verification inserts detailed hardware parameters. Test processes written in Impulse C automatically generate VHDL test files. Test vectors are recorded while software is running in software simulation. The test vectors are essentially the data stream. The data under test in HDL is identical to what is running in software. Hundreds of lines of HDL test code are generated by C algorithms. The testbench writes its own output test vectors to separate files, so they can be compared to the outputs from desktop simulation. Validating the behavior of the generated HDL occurs by comparing the two sets of outputs. Testing occurs before place and route, saving hours of compile time.

For software users the reasons for pushing to the third level, HDL simulation, are generally time and cost savings. If C-to-HDL synthesis is performed and the resulting netlist is directly used on an FPGA without performing HDL simulation, there could be a wide range of potential bugs that may not get caught while simulating just the C code. These bugs become apparent when testing the design on the FPGA, and if a bug is found in during this stage, then it is back to square one (modify the design, test it, synthesize, P&R, etc.). On a larger design, the time it takes to perform HDL synthesis and implementation is so long (ranging from an hour to an overnight run) that if issues are found after this stage you have effectively lost all the time spent doing that process.
Many of these bugs are only detected during post synthesis simulation and timing simulation. Once the cause of a bug becomes apparent, the designer can avoid such mistakes in the design phases, so as to avoid finding them in later stages. In any case, it is always a good practice to perform simulation after every stage of the process; this reduces the number of iterations you need to go through to verify your design.
HDL Simulation allows complete and efficient design verification. It is much quicker to locate the source of the problem using the debugging tools (waveform viewer, breakpoints, etc.) included in the HDL simulator. Additionally, post-synthesis simulations and timing simulations can be performed which further help locate problems which cannot be found using functional simulation. Also, many projects (aerospace, defense) require that you perform HDL simulations and HDL code coverage in order to call the design completely verified.
Design example
In the Impulse C tutorials is a 5X5 image processing pipeline with an FPGA with multiple distinct design elements. The columns process accepts incoming pixels, for example from a video stream, and stores those pixels in an internal buffer large enough to store a little more than four scan lines. When its internal buffers are filled, the process begins to emit five parallel streams of pixels representing five adjacent scan line rows. This is what is referred to as a marching columns method of buffering.

The filter process executes in parallel with the columns process, accepting the five incoming streams and performing a 5-pixel by 5-pixel convolution to generate a stream of filtered outputs. The producer and consumer processes are used during software testing to read and write sample image files. Generating HDL is as simple as selecting a project and selecting “generate vhdl”.

The developer uses graphical visualization of register stages, parallel computation, and interconnects between design elements to iterate and refine their design. After device selection comes system level HDL development. Generally the flow is to:
After selecting an appropriate algorithm or design component, the general design flow is:
Using FPGAs as co-processors, and validating them as the design progresses reduces the long term cost of ownership in several ways. It creates known good code blocks that are more easily reused. Since it works at a higher level, design migration to newer technologies is much easier as they emerge. And since it stays with ANSI C, VHDL and standard development tools it tends to be easier for future members of a given design team to maintain and update.
What’s coming in the future?
Pardon us our imperfect crystal balls, but we do see some things on the horizon that might make things easier…
Back annotation or predictive models could improve pre-synthesis results. At present, users really need to compile all the way from C to RTL in order to verify results at the working device. Given the size of some of the upcoming FPGAs, the last leg of this, place-and-route, can take 4 to 8 hours. This is a reality given the amount of processing it takes to optimize P&R for this many gates but the delay to results holds back FPGAs from some mainstream processor development teams. Devising better predictive models will reduce the number of all the way to RTL compiles required. Even better, some way to get meaningful errors or design advice in the early stages, either all the way back from RTL/layout, or in some manner that is > 90% accurately predictive of what the final results will be.
Partial reconfiguration as a methodology to reduce end-to-end compile time. Announced in the latest revisions of FPGAs from Altera and Xilinx, partial reconfiguration isolates the part of the design that actually changed. In some cases 8 hour compiles are replaced by partial compiles that take minutes. This feature is still finding its way upstream into the design tools.
Multiple FPGAs as a single “platform”. In the same way that there is no “big red button” that perfectly configures C logic for RTL, there is no button for partitioning an algorithm or system block over multiple FPGAs.
About the authors
Aldec is one of the world leaders in VHDL simulation. Mr. Sahoo is a central member of their applications staff, helping debug hundreds of real world applications.
Impulse C is the most widely used software for refactoring C code to run on FPGA. Mr. Durwood is the co-founder of Impulse with roots that trace back to the ABEL project at Data I/O.
The authors wish to thank Ed Trexel, David Pellerin, Mike Kreeger and Scott Thibault for their assistance in this article.
If you found this article to be of interest, visit Programmable Logic Designline where you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
New design C-based methods and new testbench generators enable developers to mix C and HDL. In this extended tool flow, the core application is developed in C. First level verification is performed in software simulation using Visual Studio or a comparable tool. After this functionality-only test, the developer selects a target FPGA or a target FPGA-enabled board and then verifies functionality at that level, either in-system or using a hardware simulator. The latter method saves time by reducing the iterations through FPGA place-and-route. A key productivity boost in this case is that the test files for the VHDL simulator are generated from the original C design files. There is no disconnect from design engineering to design for test. The testbench design flow is illustrated as follows:

The tool flow uses the same project files
to originate design and test code.
to originate design and test code.
Design standards used at one leading US government development lab say: “The testbench effort is typically connected to the design effort, in that the designer of the source code is also writing the testbench. In some programs the testbench effort is performed by different engineers but only if required by schedule and size of the design. Current [government contractor] design process standards require simulation of all FPGA designs. This is for FPGAs varying from basic control support for a RF board, to complex DSP applications for data processing.”
This lab uses a combination of C and HDL for development. Their reason for a C based approach to testbench is to help bridge the gap in the effort spent on testbench and lab test. Having testbench code in C lets them take that right into the lab and – possibly with some minor tweaks – start testing their hardware.
Design challenges
Software/hardware co-design is a new mindset for both types of engineers. Software developers have to understand, for example, that memory isn’t free and that “compile” can now mean an overnight run. Hardware engineers get to tear down the wall a little between software and hardware, and actually negotiate partition points between processing elements.

Hardware software co-design linkage is improved by
combining hardware and software code in one unified project.
combining hardware and software code in one unified project.
In many categories of FPGA, a low-level microprocessor core can be used to either perform low-level processing (typically multiple instantiations of the same process) or micro-control. Because multiple processors can be deployed, the wall between software and hardware really is movable. For both teams, the worst disconnect is probably the unidirectional nature of the tools. Impulse Accelerated Technologies and others have strived to correlate post synthesis results to specific lines of code, but the state of the art for software to hardware compilation is nowhere near the capabilities of software compilation. This disconnect, plus the multi-hour compile time, increases the value of step-by-step simulation and validation. Catching the mistakes pre-place-and-route can easily save a day per iteration.
First level verification is performed using an untimed C representation, focusing mostly on function and less so on clock cycles. This functional verification is mostly about finding errors in the logic of the application, rather than simulating the circuit to find cycle-by-cycle errors. The conversion of microprocessor code to FPGA code is all about refactoring the original code into multiple streaming processes that can be machine parallelized to increase throughput. Microprocessor-targeted C assumes one or two processing streams. FPGAs run at much slower clock speeds and make up the throughput rates via higher degrees of parallelism, and more flexible input and output. To exploit multiple streams, logic needs to be written such that a C-to-FPGA compiler can unroll and parallelize the code. This requires the C code to be written as much as possible as individual processes or “coarse grained” parallel logic.
Further on the coding discussion. HDL’s are often preferred for control, glue and static logic. VHDL and Verilog are both well accepted methodologies. The C-to-FPGA compiler generates synthesizable VHDL as the transfer format or API to the FPGA manufacturer’s place-and-route software. This is totally useable HDL, which means that additional VHDL and/or Verilog code can be combined in project files with this C-generated code.
Second level verification with consideration for a target FPGA. The C-to-FPGA compiler offers a pull down menu that provides almost too much selection on an FPGA target. Users can select Xilinx or Altera FPGAs; they can select them with or without the optional processing cores; and they can even select entire development boards featuring FPGAs. In the latter case the compiler manufacturer often offers a PSP or Platform Support Package. The PSP automates the low level code, which provides the developer access to the platform hardware features such as memory and I/O. At this level, verification incorporates using streams, signals, or memory to synch up the design.

Test processes written in Impulse C automatically
generate VHDL test files for Active-HDL
generate VHDL test files for Active-HDL
The main development task at this stage is iterations. FPGA resources differ very much from microprocessor resources. Optimization isn’t magic but an iterative process of refining code and looking at how it propagates. Stage delay analysis and other ways to examine buffers and clock cycles are used to maximize parallelism and performance at this stage. More experienced developers will be able to leverage special purpose areas in the FPGA such as DSP blocks.
Third level verification inserts detailed hardware parameters. Test processes written in Impulse C automatically generate VHDL test files. Test vectors are recorded while software is running in software simulation. The test vectors are essentially the data stream. The data under test in HDL is identical to what is running in software. Hundreds of lines of HDL test code are generated by C algorithms. The testbench writes its own output test vectors to separate files, so they can be compared to the outputs from desktop simulation. Validating the behavior of the generated HDL occurs by comparing the two sets of outputs. Testing occurs before place and route, saving hours of compile time.

Co-design methodology allows hardware and software
tests to originate from common design files so
functional equivalence can be compared.
tests to originate from common design files so
functional equivalence can be compared.
For software users the reasons for pushing to the third level, HDL simulation, are generally time and cost savings. If C-to-HDL synthesis is performed and the resulting netlist is directly used on an FPGA without performing HDL simulation, there could be a wide range of potential bugs that may not get caught while simulating just the C code. These bugs become apparent when testing the design on the FPGA, and if a bug is found in during this stage, then it is back to square one (modify the design, test it, synthesize, P&R, etc.). On a larger design, the time it takes to perform HDL synthesis and implementation is so long (ranging from an hour to an overnight run) that if issues are found after this stage you have effectively lost all the time spent doing that process.
Many of these bugs are only detected during post synthesis simulation and timing simulation. Once the cause of a bug becomes apparent, the designer can avoid such mistakes in the design phases, so as to avoid finding them in later stages. In any case, it is always a good practice to perform simulation after every stage of the process; this reduces the number of iterations you need to go through to verify your design.
HDL Simulation allows complete and efficient design verification. It is much quicker to locate the source of the problem using the debugging tools (waveform viewer, breakpoints, etc.) included in the HDL simulator. Additionally, post-synthesis simulations and timing simulations can be performed which further help locate problems which cannot be found using functional simulation. Also, many projects (aerospace, defense) require that you perform HDL simulations and HDL code coverage in order to call the design completely verified.
Design example
In the Impulse C tutorials is a 5X5 image processing pipeline with an FPGA with multiple distinct design elements. The columns process accepts incoming pixels, for example from a video stream, and stores those pixels in an internal buffer large enough to store a little more than four scan lines. When its internal buffers are filled, the process begins to emit five parallel streams of pixels representing five adjacent scan line rows. This is what is referred to as a marching columns method of buffering.

Flow visualization tools within Impulse C illustrate
how the hardware design will propagate from the C files
how the hardware design will propagate from the C files
The filter process executes in parallel with the columns process, accepting the five incoming streams and performing a 5-pixel by 5-pixel convolution to generate a stream of filtered outputs. The producer and consumer processes are used during software testing to read and write sample image files. Generating HDL is as simple as selecting a project and selecting “generate vhdl”.

Punching down into the automatically generated HDL
test bench to see that it automatically generates the
required interfaces to test the streams.
test bench to see that it automatically generates the
required interfaces to test the streams.
The developer uses graphical visualization of register stages, parallel computation, and interconnects between design elements to iterate and refine their design. After device selection comes system level HDL development. Generally the flow is to:
- Upload to the system level test.
- Pull the HDL block into the system level test.
- Using FPGA manufacturer bus functional models to simulate bus interaction, towards doing the whole SOC test.
- Integrate with board level simulation
After selecting an appropriate algorithm or design component, the general design flow is:
- Import or enter C based image, signal or data processing source algorithms.
- “Wrap” and download to FPGA model for initial desktop functional verification
- Analyze, refactor and iterate to achieve target FPGA performance
- Verify C code functionally in desktop environment e.g. Visual Studio, Eclipse, GCC, etc.
- Verify optimized, scheduled hardware, by automatic translation to the RTL model, and simulation in Aldec’s Active-HDL. Automatically generated test vectors are used as the testbench in Active-HDL.
- Synthesizing VHDL or Verilog RTL model. Performing post-synthesis simulation using the same vectors in Active-HDL.
- Performing P&R and Implementation. Perform Timing simulation using same Vectors in Active-HDL.
Using FPGAs as co-processors, and validating them as the design progresses reduces the long term cost of ownership in several ways. It creates known good code blocks that are more easily reused. Since it works at a higher level, design migration to newer technologies is much easier as they emerge. And since it stays with ANSI C, VHDL and standard development tools it tends to be easier for future members of a given design team to maintain and update.
What’s coming in the future?
Pardon us our imperfect crystal balls, but we do see some things on the horizon that might make things easier…
Back annotation or predictive models could improve pre-synthesis results. At present, users really need to compile all the way from C to RTL in order to verify results at the working device. Given the size of some of the upcoming FPGAs, the last leg of this, place-and-route, can take 4 to 8 hours. This is a reality given the amount of processing it takes to optimize P&R for this many gates but the delay to results holds back FPGAs from some mainstream processor development teams. Devising better predictive models will reduce the number of all the way to RTL compiles required. Even better, some way to get meaningful errors or design advice in the early stages, either all the way back from RTL/layout, or in some manner that is > 90% accurately predictive of what the final results will be.
Partial reconfiguration as a methodology to reduce end-to-end compile time. Announced in the latest revisions of FPGAs from Altera and Xilinx, partial reconfiguration isolates the part of the design that actually changed. In some cases 8 hour compiles are replaced by partial compiles that take minutes. This feature is still finding its way upstream into the design tools.
Multiple FPGAs as a single “platform”. In the same way that there is no “big red button” that perfectly configures C logic for RTL, there is no button for partitioning an algorithm or system block over multiple FPGAs.
About the authors
Aldec is one of the world leaders in VHDL simulation. Mr. Sahoo is a central member of their applications staff, helping debug hundreds of real world applications.
Impulse C is the most widely used software for refactoring C code to run on FPGA. Mr. Durwood is the co-founder of Impulse with roots that trace back to the ABEL project at Data I/O.
The authors wish to thank Ed Trexel, David Pellerin, Mike Kreeger and Scott Thibault for their assistance in this article.
If you found this article to be of interest, visit Programmable Logic Designline where you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).
Navigate to related information


dorecchio
4/12/2012 9:06 AM EDT
Any approach that raises the level of abstraction for design of the new SoC FPGAs is a good thing. Automating the flow through the design process is even better. I like the work that Impulse and Aldec have done here.
Sign in to Reply