Last week there was a bustle of activity at the Embedded Systems Conference (ESC) in Boston. In particular, it was apparent that the use of programmable logic in the form of FPGAs is dramatically increasing in today's embedded systems designs.
One company that is heavily involved in this area is GateRocket, which specializes in technology used to verify large, complex FPGA designs. A couple of days ago, Dave Orecchio from GateRocket posted a blog on the GateRocket website (Click Here to see the full blog).
In his blog, Dave presented an excerpt from a paper GateRocket presented at ESC. This excerpt is reproduced below (there's a link in the original blog that you can use to obtain a full copy of the paper).
******** Excerpt from GateRocket's ESC paper **********
Embedded system designers can leverage the incredible density in the latest FPGAs to mix processors and IP, but the design team must have a good plan for the verification and debug process.
Not long ago, embedded design teams used FPGAs to combine disparate glue logic into one chip and easily debugged the relatively low-density ICs. Today, many embedded designs are using FPGAs with density approaching that of ASICs. The designs often rely on an embedded processor in the FPGA – sometimes hard-wired in the IC and sometimes implemented with the programmable logic in a soft configuration. In either case, the design team faces a complex verification task that includes the embedded processor and a variety of functional blocks either designed in house or bought as third-party IP. A big challenge comes in the debug and the verification process, especially as a team mixes processor cores, other purchased IP, and their own circuit blocks. Verifying such a design expediently and meeting time-to-market requirements is best accomplished with a Device Native approach.
Embedded processors and IP Embedded design teams have a plethora of choices when it comes to IP and processor cores for SOC designs based on state-of-the-art FPGAs. Altera and Xilixn both offer their own embedded processor cores, as well as support for popular third party processors from suppliers like ARM, IBM and Freescale.
Other types of IP are readily available as well, ranging from the simple (e.g. UARTs) to the complex (PCI controllers). Today’s design tools allow you to efficiently assemble all of the blocks at a high level of abstraction. Then you only need to simulate your design to find bugs, synthesize the design, go through the place-and-route process, configure the FPGA, and you are off to the lab to debug the deign instantiated in the FPGA.
Unfortunately, it’s not that simple.
Using embedded technology, whether from your FPGA supplier or a third party – or even internally developed – introduces issues with simulation, IP models, synthesis tools, and place-and-route tools.
Blocks such as embedded processor are especially problematic from a verification performance standpoint. The fact is that an embedded processor may be sufficiently complex that the simulator is overwhelmed just trying to clock the processor. Moreover design teams often rely on a testbench for verification at the RTL level that's implemented as an embedded program that executes on the processor core. The result is a simulation comprised of billions of CPU cycles that takes forever even on top-of-the-line multi-core workstations.
If the simulation bottleneck isn’t bad enough, next comes synthesis and place and route. In a perfect world, synthesis would deliver an exact gate-level representation of the RTL design that has been proven correct in RTL simulation. Rarely, however will the design team find that a design works when first instantiated in the FPGA and tested in the lab.
There are actually any number of things that can go wrong in the synthesis process, most notably problems with integrating and verify third party IP blocks, and mismatches in the way that the RTL simulator and the synthesis tool interpret the design.
Bugs can creep into a design at the place-and-route stage as well. Frequently, in an attempt to optimize a design, the place-and-route tool will make an arbitrary choice that causes a functional bug. This creates a real problem for the embedded team. The legacy tools at desginers’ disposal don't provide sufficient visibility into the design instantiated in the FPGA for the engineers to make educated guesses as to the cause of the problem. The team must often make guesses and chase the problem.
While the guess-, test-, and revise- process is extremely inefficient, it is even more time consuming than you might first guess. It can take 18 hours to take a design through the synthesis and place-and-route cycle. Even with workstation performance constantly on an upward ramp, most teams run a new synthesis and place-and-route cycle at night hoping the new design will yield a full productive day of debug in the lab. But any of the bugs described above can waste the lab day if the FPGA instantiation doesn't work.
Device-native debug and verification To addresses these issues faced by designers looking to use embedded technology in FPGAs, GateRocket offers our Device Native approach.
With a Device Native approach, the design team can connect the actual processor instantiated in the target FPGA into the RTL behavioral simulation environment. Even prior to verifying the entire design in the simulator, the team can proceed through the synthesis, place-and-route, and FPGA configuration steps. But only known-good blocks are moved into the FPGA. Simulation can now proceed with the processor running at full speed in the FPGA while third-party IP and other functional blocks are simulated. The device native approach greatly accelerates the simulation process because the simulator doesn't have to deal with the embedded processor and the actual processor can execute the testbench.
The Device Native technology can also help the team quickly deal with the other types of issues that arise from errors in the design, bugs in third-party IP, and issues with synthesis and place-and-route tools. Moreover, it alleviates the problem of lab days that are wasted while a new synthesis and place-and-route cycle runs.
The debug advantage afforded by the Device Native technology is visibility into the design. For example, consider a situation with an IP block where the RTL and gate-level models differ in some way. With a device-native tool, the team can compare the simulation results directly with the results obtained when the block is instantiated in the FPGA. The team can precisely see where the two diverge and systematically determine the problem.
The RocketVision implementation goes one step further allowing design teams to take full productive advantage of lab days even when one or more functional blocks aren't perfect. When the team identifies a problem in a functional block, the team can virtually move that block from the FPGA instantiation back into the RTL simulator. The team can even correct a bug once the block is back in the simulation environment. The team can continue to debug the remainder of the FPGA design waiting to actually fix the block in the FPGA instantiation until the next synthesis cycle.
A lengthy verification and debug process is far more than an inconvenience. Such delays can cause a team to miss a market window and waste untold R&D dollars. The Device Native approach provides the best insurance against such fate.
sounds good, but many times it is difficult to really find out WHY there is a difference in the simulation results]
Design complexity naturally makes this difficult, but it could be made easier if the tools were user friendly.
The tools start by assuming the design is complete and they do not provide efficient iterations as the design evolves. The early stages of design do not need synthesis to optimize away major pieces of logic, nor does it need place and route, and timing analysis should be deferred to the physical design stage. These things are in the flow and consume a lot of time and may be necessary to produce RTL that conventional simulators use.
Just realizing a few basics and producing user friendly tools can help a lot.
Hardware consists of data flow and control logic. Data flow is routine no matter the form of design entry. The "balloon/cloud" of control logic is most error prone. The most concise way to define control logic is with Boolean algebra. Simulation is fast and easy especially with synchronous logic, Boolean is text and easily parsed, can be entered as text or HDL can be parsed into Boolean. i.e. a string of if's make an and, else is negation, etc.
On the programming side, optimizing C compilers love to throw away the code used to interface with memory mapped IO, so there should be an embedded C, not the application standard with all the libraries, memory allocation, pointer arithmetic, stack overflows, etc.
I wish this kind of paper used more actual examples (maybe from happy customers) that showed a real success. These types of solutions often sound good on paper, but when you actually try and use the tool tricky issues crop up. For example, the example
"the team can compare the simulation results directly with the results obtained when the block is instantiated in the FPGA. The team can precisely see where the two diverge and systematically determine the problem"
sounds good, but many times it is difficult to really find out WHY there is a difference in the simulation results. An error may not show up for many cycles in an embedded design. Now, if you can help me with that problem, I'm interested...
If there's one hot area in electronics design at the moment, that area is FPGAs, which are appearing in all sorts of applications for which they would never have been considered only a few short years ago...
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.