In a previous blog Brian talked about what makes a prototype unique, and said he would come back to the issue of emulators and accelerators... so here he is...
In a previous blog, I talked about the confusion that seems to exist about the similarities and differences between emulators, accelerators and prototypes. In that blog I talked about what makes a prototype unique, and said I would come back to the issue of emulators and accelerators, so here we are.
First off, most of the time, as a user, you don’t really care. The differences are in the internals of how they are implemented, and this may affect the features they make available, but for the most part the confusion is created by the manufacturers of these devices so that they can say they are #1, or unique, or have highest capacity or speed or whatever. Let’s cut through all of that and look at what is fundamentally different. There is an umbrella term for all of them which is hardware assisted verification.
The starting point is a model. Today most of those models will be at the RTL level, but over time we can expect to see higher abstraction models being accepted. It is all a matter of the sophistication of the synthesis technology available. Today we are starting to see ESL synthesis being sold as a standalone product, and so we can expect some of this technology to move into the hardware assisted verification products before long. The ultimate goal is to make the model execute faster than it could in a software simulator.
A second issue is that software simulation slows down significantly when it exceeds the physical memory of the computer – so there is a capacity issue that plays here as well. There are two general ways to solve this problem, and I will refer to them as direct and indirect implementations.
When we compile a model into an FPGA, we have created an actual implementation of that model in hardware. It may not be the same implementation as would be used in an SoC, but it is an implementation none-the-less. This is what we mean by direct. When this technique is used they are generally referred to as emulators, because the mapping emulates the function of the intended hardware. We are directly executing an implementation of the model.
Let’s contrast that to a simulator. Here we artificially execute the model by keeping track of changes that would propagate through the model. Mechanisms are devised that allow the effects of concurrency to be evaluated even though the simulator is actually incapable of doing more than one thing at a time. So a simulator is an indirect implementation example.
There are some simulation accelerators that contain a large number of simple processors, each of which simulates a small portion of the design and then they pass the results between them. Each of these processors runs slower than the processor on your desktop, but the accelerator may possess thousands or millions of these smaller processors and the net result is significantly higher execution performance. They can of course deal with parallelism directly as all of the processors are running in parallel. An example of this type of hardware assisted solution is the Palladium product line from Cadence. Each of the processors could have arbitrary capabilities, such as dealing with visibility, debug etc.
Within the direct implementation solutions there are again two main types. These are based on custom solutions or off-the-shelf solutions. We will start with the custom solutions. With these there is going to be an FPGA like structure somewhere in the device, although in general they employ very different types of interconnect than would be seen in an FPGA. The custom chip could also contain debug circuitry, visibility mechanisms and a host of other capabilities. Each chip is capable of emulating a small piece of a design, and larger designs are handled by interconnecting many of the chips together, again with sophisticated interconnect capabilities. An example of this type of emulator is Veloce from Mentor Graphics.
The other way to implement an emulator is by using off-the-shelf components such as FPGAs. Here we not only map the design into the FPGA, but also implement the visibility, debug and other such capabilities into the FPGA as well. As with the custom chip case, multiple FPGAs can be put together to handle arbitrary design sizes. An example of this type of emulator is ZeBu from EVE.
The next level of confusion comes when we talk about how hardware-assisted verification solutions are used. First there is in-circuit emulation. This is where an emulator or accelerator is connected into a real world application. For example, you may be designing a USB device. In this case, you would connect the emulator to a physical layer for USB and then plug it into a computer, or other device that forms the other end of the USB connection. Now you operate it as if it were the real device.
There is one issue here which is that emulators or accelerators are generally not able to run nearly as fast as the real world. Most emulators can only muster a few MHz of clock speed, especially when full visibility is made available. So it is often necessary to insert a speed bridge that can handle the difference in execution rates each side of the bridge. This may involve data buffering or manipulation of the protocols to artificially slow down the real world to the rate that the emulator can handle.
The next major way they are used is standalone. This means that the entire model fits into the emulator or accelerator, along with a set of stimulus to exercise the model. They can run as fast as the emulator is capable of, stopping only when additional stimulus is required, or when captured data has to be flushed out of the device. If the design contains a processor, it is also likely that a version of the processor will exist for the emulator.
Emulator vendors provide special boards that make many of the popular processors available. But if parts of the design or testbench cannot be mapped into the emulator, then it has to be coupled to a software execution environment. This is usually called co-simulation, as it inherently involves two “simulation” engines cooperating to solve the problem. This solution suffers from the same problem that most software co-simulators have in that communications slows them down a lot. The emulator can now only run as fast as the simulator, or actually even less because the communications makes it even slower.
A more modern alternative is what is called co-emulation. The primary difference is that communications is raised to the transaction level rather than being at the implementation level, but a full description of this and the way it is done will have to wait for another blog.
Ralph Zak did a good job of discussing the differences between Emulation and FPGA prototyping.
I would like to elaborate a bit from my vantage point of working for S2C, a leading rapid SoC prototyping supplier in China and Taiwan and now in the US. We have focused on the rapid prototyping market. FPGA boards are common to prototyping and emulation. With rapid prototyping, our focus is on getting a fully operational hardware platform including the memory and I/O interfaces. Getting FPGAs to run at system, or close to system speeds, is often not trivial. To that end, we have pre-engineered Prototype Ready(tm) IP solutions to speed the development of the SoC prototype.
FPGA prototypes benefit both software and hardware development. As Ralph pointed out, far more SoC prototyping boards are sold to support software development than hardware development. I concur with his usage figures. Currently we are seeing over 70% of our boards used for software development.
Once the SoC prototype is verified, simultaneous hardware and software development can begin. There is a huge payback from early software development on the target hardware platform. Not only are the software debug cycles orders of magnitude faster than computer simulation but seeing the results on the actual platform is a much more effective debug environment. In addition, because the significantly higher performance of the prototype platform, the software engineer can run orders of magnitude more tests.
Our customers are saving months on their development schedules by starting with SoC prototypes.
Here are some additional thoughts on how to distinguish prototyping from emulation.
The role of prototyping is to provide a pre-silicon platform for early system integration of the SoC design with firmware and applications software, not RTL debug. Runtime speed is critical due to the need to offer a similar environment to the normal host based SW development environment. Cost is also critical as often 5, 10 or more systems are needed to support the large number of SW developers. Note that the RTL must be 99.99% clean when you begin running the prototype due to the limitations of debug visibility inherent in the tools.
Emulation systems can use FPGAs or emulation ASICs. Emulators perform both simulation acceleration and early system integration capability. By providing internal visibility to all or tens of thousands of the internal registers and nodes per device, like simulators, emulation systems are used for RTL debug. By reducing simulation run times by factors of 100s or 1000s, they shorten the time to tape-out by weeks or months.
Emulation software functionality is the key to map designs quickly, provide for debug, and to manage host-emulator data flow and interactions. An emulation system is also capable of system integration and debug of software, like prototypes. However, due to the higher cost of custom chip based emulation systems these typicaly need to be complemented with FPGA-based systems to provide access to large teams of software developers.
For emulation systems, another consideration is that the current largest FPGAs are so much bigger than the custom emulation chips, they can easily support the emulation specific IP and interconnect functions like debug capture and management in them, and still have two to four times the capacity per device of custom chips. So such systems, while equivalent in functionality, tend to be smaller, less complicated and costly, and run much faster.
It's a funny old world isn't it -- a little earlier today I posted a How-To design article that talks about a new concept in FPGA-based prototype platforms for SoC designs in which a standard FPGA prototype board can be enhanced to offer high-visibility and also offer co-simulation and co-emulation capabilities (http://bit.ly/ccWL6T) ... and then Brian goes and posts this blog ... it's almost as though we hand planned it (but we didn't)