One of the cool aspects of writing my recent book "The Design Warriors' Guide to FPGAs" was that it facilitated my introduction to a variety of next generation design tools and flows. On the downside, however, it really began to hit home that I actually pre-date much of EDA as we know it today.
When I started my career as a hardware design engineer way back in the mists of time (OK, 1980), we didn't have schematic capture systems (we used pencils, stencils, and paper); we didn't have minimization and optimization programs (we had a guy on the team who had a size-16 brain with go-faster stripes); we didn't have timing analysis applications (once again, you were back to pencil and paper); and functional verification largely involved sitting round a table with your peers asking "what does this bit do" and, given an explanation, saying "OK, that looks like it should work."
I'm sure that you can argue this sort of thing back and forth for the rest of time, but from my personal perspective in the trenches there have been three major "periods" in EDA technology (as it pertains to the digital logic designer) up until this time:
1) No EDA tools worth talking about (as noted above).
2) The advent of schematic capture, digital logic simulation, and static timing analysis.
3) The introduction of language-driven design in the form of RTL descriptions, simulation, and synthesis.
The point is that we may be poised on the brink of a new era featuring pure C/C++ synthesis technology. Allow me to elucidate.
Problems with traditional RTL-based flows
For the purposes of these discussions, let's assume that we are interested in the design of some compute-intensive functions such as DSP algorithms targeted toward wireless, radar, satellite communications, and video/image processing type applications. In this case, the system architects will get the ball rolling by making macro-architecture (mA) decisions, such as which portions of the design will be implemented in hardware and which will be realized in software (Figure 1).
Figure 1 A traditional design flow
This is followed by algorithmic verification using tools such as Matlab or SPW. More often than not, a C/C++ model is generated toward the end of this process. In addition to providing an extremely fast simulation model (that can run thousands of times faster than an equivalent RTL representation), this model can subsequently be used as a golden reference model against which to compare the final RTL implementation. (Somewhere along the way we would move from floating-point to fixed-point representations, but that's a tale for another time).
The problem arises when we wish to take the design from the untimed C/C++ domain into the timed RTL realm. In the vast majority of design flows this is currently performed by hand, which leads to a variety of problems:
- Capturing RTL is time-consuming. Even though Verilog and VHDL are intended to represent hardware, it is still time-consuming to use these languages to capture the functionality of a design.
- Verifying RTL is time-consuming. Using simulation to verify large designs represented in RTL is computationally expensive and time-consuming.
- RTL is less than ideal for hardware-software co-design. Generally speaking, it can be painful verifying (simulating) the hardware represented in VHDL and/or Verilog in conjunction with the software represented in C/C++ or assembly language.
- Evaluating alternative implementations is difficult. Modifying and re-verifying RTL to perform a series of "what-if" evaluations of alternative micro-architecture implementations is difficult and time-consuming. This means that the design team may be limited to the number of evaluations they can perform, which can result in a less-than-optimal implementation.
- Accommodating specification changes is difficult. If any changes to the specification are made during the course of the project, folding these changes into the RTL and performing any necessary re-verification can be painful and time-consuming. This is a significant consideration in certain application areas such as wireless projects, because broadcast standards and protocols are constantly evolving and changing.
Perhaps the biggest problem with this traditional flow is that the RTL is implementation-specific in that all of the implementation "intelligence" associated with the design is hard-coded into the RTL. Realizing a design in an FPGA almost invariably requires a different RTL coding style compared to that used for an ASIC implementation. This means that it can be extremely difficult to retarget a complex design represented in RTL from one implementation technology to another.
In fact, this implementation specificity goes beyond the coarse ASIC-versus-FPGA boundary, because even if we assuming a single underlying device architecture the way in which a set of algorithms is used to process data may require a variety of different micro-architecture implementations depending on the target application areas.
Over the last year or so I've talked to a number of designers using the above flow. One point I heard time and time again relates to the fact that capturing and verifying the design at the RTL level is painfully resource-intensive and time-consuming.
The end result is that they often opt for "safe" solutions (for example, massively parallel implementations) that they feel confident will meet the design goals. Once they have an implementation for a portion of the design that meets the requirements, they lock it down and move on to the next task. They know that they could almost certainly "do better" given enough time, but they also know that they don't have the luxury to experiment further.
The new Catapult C-based flow
Things are somewhat different in the case of the new C/C++ tools that have only recently been announced (only a day ago in the case of Mentor's new Catapult C product, as I pen these words). Once a C/C++ model has been captured and verified, Catapult C can be used to take that model and automatically synthesize corresponding RTL (Figure 2).
Figure xx- Title
First the Catapult C engine reads in and analyzes the C/C++ source code. By means of a graphical interface, the user gains visibility into the various algorithmic elements forming the design along with constructs such as loops. Using this interface, the user can indicate how different structures are to be handled, along the lines of: "fully unravel this loop," "partially unravel that loop," "resource share this operator with that statement," "pipeline this portion of the design," and so forth.
For each such design decision, the system provides an immediate feedback in terms of predicted silicon area/real estate/resource utilization and latency. This allows the user to quickly and easily evaluate a number of different "what-if" scenarios. Each of these scenarios can be independently named and saved as a side file. The end result is that the C/C++ source code remains untouched, so this single source representation can be used to drive a suite of different implementations.
This type of C/C++ based flow addresses all of the problem areas we identified with the RTL portions of traditional flows as follows:
- Creating pure C/C++ is fast and efficient. Pure untimed C/C++ representations are more compact and easier to create and understand than their RTL equivalents.
- Verifying C/C++ is fast and efficient. A pure untimed C/C++ representation will simulate 100 to 10,000 times faster than an equivalent RTL representation.
- Pure C/C++ representations facilitate hardware-software co-design.
- Evaluating alternative implementations is fast and efficient. Modifying and re-verifying pure untimed C/C++ to perform a series of "what-if" evaluations of alternative micro-architecture implementations is fast and efficient. This facilitates the design team's ability to arrive at fundamentally superior micro-architecture solutions. In turn, this can result in significantly smaller and faster designs as compared to flows based on traditional hand-coded RTL methods.
- Accommodating specification changes is relatively easy. If any changes to the specification are made during the course of the project, it's relatively easy to implement and evaluate these changes in a pure untimed C/C++ representation, thereby allowing the changes to be folded into the ensuing downstream RTL implementation.
In fact a number of companies and academic institutions have been furiously working away on tools of this ilk, so what makes Catapult C different? Well, first of all, it's customer tested and proven. Although Catapult C has been in "stealth mode" until now, selected customers have been working with it intensively for years, and Mentor can now boast ten tape outs based on this technology.
Furthermore, in addition to the fact that Mentor has nine Catapult C-related patents either pending or granted, one point that really caught my eye relates to the fact that you can't create different portions of a design in isolation. For example, if you are trying to implement a particular algorithm, you are very interested as to how this portion of the design will be interfaced to the rest of the system. For example, there's no point in making a portion of the design capable of performing ten times faster than its interfaces can provide input data (or accept output data).
Thus, Catapult C comes equipped with a library of built-in interface components such as standard wire interfaces, FIFOs, and single-port and dual-port RAMs. (You can also add your own interface components as required.) Using simple constraints, you can apply these components to the inputs and outputs of the portion of the design under consideration. Catapult C will then use this interface knowledge to fine-tune the design to achieve the optimum implementation.
Of course things are always more complex than they seem, but I've chatted to a number of different real-world design teams who have used Catapult C and they all seem very much enthused, so I have no hesitation in awarding Catapult C an official "Cool Beans" from me. Until next time, have a good one!
Clive (Max) Maxfield is president of Techbites Interactive, a marketing consultancy firm specializing in high-tech. Author of Bebop to the Boolean Boogie (An Unconventional Guide to Electronics) and co-author of EDA: Where Electronics Begins, Max was once referred to as a "semiconductor design expert" by someone famous who wasn't prompted, coerced, or remunerated in any way.