Since its debut in 2004, the current generation of high-level synthesis (HLS) tools has made tremendous progress in terms of both quality of results (QoR) and wider applicability. The success of this technology cannot be denied: HLS is here to stay. However, as in other arenas of electronic design automation, a language war threatens to divide the user community, pitting C/C++ against SystemC. While some rallied around a more abstract form of modeling in pure untimed C++, others argued for more detailed models; such as the explicit timing, structure, and parallelism of SystemC.
These two standard languages do serve different design needs; yet for this very reason they complement each other to great advantage in a mixed-language HLS flow. For example, while the complex algorithms of next-generation broadband modem ASICs are most effectively expressed in pure C++, the intricate control logic found in other parts of the same device benefit from SystemC cycle-accurate models.
Today, a new era of peace has already begun with the introduction of dual-language HLS tools. Designers can now express complex interface protocols using a timed SystemC source while keeping the rest of the design functionality in pure untimed ANSI C++. And they can express structure and hierarchy either by using SystemC modules or by inferring them from natural C++ boundaries. Now that HLS tools can deliver full-chip synthesis, both pure C++ and SystemC are needed to provide the most efficient and productive way to handle all the various parts of the system.
It is important to understand where and how to use these modeling options in order to increase productivity and QoR. For this discussion, we can divide a design into four modeling domains: processing, control, interfaces, and hierarchy. The guiding principle is to keep the models as simple as possible as long as possible. Simplicity is bound by what constitutes sufficient detail — models must have enough detail to be meaningful; yet you need sufficient detail and no more. A corollary principle is that when modeling untimed aspects of a design, use purely functional C++ models. When timing and concurrency are involved, add the timing detail that SystemC classes provide. Processing units Processing is functionality that does not depend on specific timing properties. It is algorithmic nature and can be expressed as a transfer function or data path where time is not a parameter. All that’s concerned is to get data in, crunch it, and produce the results.
Time will be an artifact of the implementation, but it’s not an attribute of the functionality. Because time is not a part of the behavior, there is no need to add timing detail. Thus, purely functional C++ models are more appropriate for processing applications, following our primary directives not to add detail unnecessarily and to use C++ for untimed applications.
The implementation will almost always require some parallelism, but it has been proven that it is easy to extract parallelism from sequential sources. Thus, it doesn’t add anything to describe the parallelism in the source itself, but it does create more work and more overhead; more coding, slower simulations, and harder debug. Also, once you put anything in SystemC, everything else has to be written or wrapped in SystemC because SystemC only communicates with SystemC; so you are snow balling this burden on everything else when you don’t have to.
Further, because SystemC has no utility for algorithmic development, it doesn’t make sense to ask the algorithm developers to start writing in SystemC. On the other hand, the algorithmic model will often be written in C. So you might as well keep it simple and untimed at this point and use the language it was written in.
Below is a synthesizable C++ implementation of an 8 tap constant coefficient FIR filter.
What war? SystemC is just a C++ class library. It would pretty lame not to be able to have them coexist.
How about analog and power? When will we see -
"The wait is over: C++ and Spice coexist in a single flow"?
DKC - yes, i was a bit perplexed by the title as well. Multiple HLS technologies should co-exist in a single work-flow, not only these. It is really in what application you are designing for and what VnV activities are required- which can really impact the quality of the product and TTM.
Why is the example given for these HLS tools always a trivial datapath block such as an FIR filter?
It gives the impression, rightly or wrongly, that the tools are only good for simple pipelines. I am not losing sleep over those sorts of designs.
"... but it has been proven that it is easy to extract parallelism from sequential sources."
This will come as news to everyone in the High Performance Computing community, who have been attempting to do this unsuccessfully for over 40 years. It will also be news to the authors of the numerous textbooks on parallel algorithms (if extracting parallelism was easy, why would we need them?)
Sam Fuller (CTO Analog Devices) and co-author Lynette Millett have the opposite opinion: "Experience has shown that parallelizing sequential code or highly sequential algorithms effectively is exceedingly difficult in general." in their article "Computing Performance: Game Over or Next Level", IEEE Computer, January 2011, pp. 31-38, reporting on the NSF-sponsored study by the Computer Science and Telecommunications Board of the US National Academy of Sciences.
This article shows that SystemC is required to do any real design. Since SystemC is a class lib of C++, thus a superset, and since SystemC processes can contain pure untimed C++ code, shouldn’t the article be titled: “SystemC is the language of ESL”.
Also why does the article avoid the very popular TLM standard which enables easy separation of the interface from the computation and yields large simulation speedups (instead of inserting an RTL interface from a library)?
For those interested in how to do production design with SystemC, there's an archived EETimes webinar by Mark Warren aptly titled "Practical application of high-level synthesis in SoC designs".
Maybe this example might help clear up some of the "huh?!" discussion: Your task is to HLS a block that crunches data in a certain way, and is interfaced into a certain system environment.
For the crunching part, you want to use the HLS tool to help you examine lots of microarchitectures, meaning implementation possibilities (various data widths, depths of pipelining, etc.). You may even want to create two or three different implementations at different price/performance points, but the tool must create the RTL for all of those different implementations. You only change a few microarchitectural parameters, pipeline-depth, for example, push a button and out pops radically different RTL.
In the meantime, however, there are interfaces to the world around that algorithmic portion that absolutely must proceed according to a very exact, and possibly intricate, timing definition. You have to ensure that those timing concerns are never violated. It's possible, for example, that a certain implementation of that algorithmic logic will not be able to process the data quickly enough to satisfy the data rates of the interface.
If that's the case, we want the HLS tool, not the designer, to limit the available choices of microarchitectures so that the bandwidth requirements of the interface will never be violated.