Since resorting to synthesizing hardware is usually just a way of parallelizing a serial algorithm. I would like to break the problem down into a) how do you write very parallel code, and b) how do you compile that for different platforms. I.e. translating serial code into parallel code is in itself useful, it lets you get onto SMP and GP-GPU platforms, and with a bit more work FPGAs, and if you are really keen ASICs.
If you can do the refinement in C/C++ down to the fine-grained parallel description before attempting the translation to hardware, you'll have a more flexible toolset.
I use the word mature to describe tools that now do resource allocation, scheduling and binding. Early offerings were little more than translators. But even these early tools did a good job of clock insertion and were able reorder what was coded and determine what could be done in parallel. The current offerings of 2 of the big 3 EDA companies do far more and have very sophisticated scheduling capabilities which allow parallelism creation at quite a high level. And they know how to handle functions and pointers, and pointer math. Recursion tends to be a short comming but that is only a matter of time.
With regard to describing control in a way that hardware engineers are familiar with... This was an emphasis in the article. ESL tools allow us to think primarily about data path while we code. Which is what we think about when we architect. The examples shown were about what sort of datapath details one might have to pay attention to. The compiler isn't doing much work with datapath. Simply doing what the designer asks. The job of the compiler is in control. It is meant to extract control from procedural constructs. We want the designers to stop thinking about control in the way in which thay have thought about it traditionally. In my experience this orientation tends to be hard to break. Once done though, it's quite liberating.
Seems there's always a reference to a mature ESL compiler and there is no mention of data flow and control logic which chip designers understand. The gcc compiler produces "rtl" that is then compiled to a target instruction set. The data flow is generally specify two operands and an operator with a place to put the result. That kind of covers the data flow. Either the control logic is ignored or done by some magic. Well the control logic is where the problems arise and until there is some explanation of the algorithm for controls, count me out.
No, I don't like HDL, it is too verbose; schematics quickly turn to spaghetti; YES, I want a better way, but so far a cpu is the tried and true way to get from a procedural language to hardware. When an ESL compiler looks at the if/else stuff and extracts the Boolean to find the assignments that can be done in parallel and leaves the rest in sequential memory I will become very interested.
The argument , why would I code in C if I have to pay attention to this level of detail? Why not just do it in RTL? I presented what I hoped would show two different dimension of coding where at least some of the architecture is implied in the src. First, there are rules. There have to be if you want the src input to be descriptive. In the end, it turns out these rules are simpler, but different than all the rules that now come naturally to us in RTL coding. Second, the SHA example (which is too small to show much), where only a few lines of code differed from the original software, was meant to show that such rules are rather obvious. In summary, itís power. Power over what you are designing. When you are working with examples as small as SHA, I have to agreeÖ Do it in RTL or whatever you like. C or C++ should not be an obstacle Itís the 100k lines of C with a system that effectively has infinite states where a mature ESL approach has clear advantages. Ironically I have noted that offerings that require less attention to coding garner much criticism by RTL folks who want less of a black box. Again, you have those choices in the current compilers out there.
All of this comes down to having a successful result. One where you can look back and say, that was definitely easier than RTL. That does take a little investment. Itís a pretty small investment in my opinion but you have to have a reason to do it. Either because you think there has to be a better way, or because you need there to be a better way. ESL is an attempt to move the ball forward. As someone who has used RTL for most of my career (and continues to do so) I have some complaints about it. And I have complaints about ESL. I still see at as a natural and necessary step forward which is gaining focus.
All, thanks for the interesting feedback. It's always difficult to describe such a broad subject in such a short space. Much must be left out and this was not a technical article.
C-synthesis: Excellent points. As ESL compilers mature the architecture that may be required of the input shifts. Clearly there exists a spectrum of capabilities across EDA offerings now. There is also a philosophical aspect. Some offerings start with the objective that one be able to describe intricate details of the architecture in the src. An alternative approach is to give the compiler constraints that allow it to choose from multiple architectures though synthesis and give the designers choices through GUIs or other feedback methods. Both have upsides and downsides. The former requires careful attention to coding in order to direct architecture but allows a high degree of control. The alternative often results in GUI displays presenting options at a level a little above gates; arguably what we are trying to get away from. But this approach is very effective for small to mid size design especially in what would be the DSP space. Another argument; if the compiler adds architecture not implied in the src code, execution of the src natively, means less. Fortunately, whatever best suites you, is available. The comment about the compiler not using embedded arrays, is part of this synthesis thread. Itís in fact is a feature of almost any of the current compiler offerings through various means. The conclusions this article attempted to draw were not synthesis related.
(I hit the 2000 word limit. To be continued)
So the compiler took what was defined as an array, broke it down into discrete registers and generated a maze of muxes, ignoring the fact that embedded arrays are plentiful, dense, fast, and are allocated from a million or more available bits.
The available registers available is at least an order of magnitude less. No the generated design is not even close to what a hardware designer would want.
My parser has a simulation of the parsed
C code and this simple loop runs in 33 cycles (about 11 clocks per iteration)
for(x = 10; x (less than operator not allowed in reply) 13; x += 1)
y = x + x;
If I could find a way to upload a file or get your email I would send more.
The best I have is a crude website where you can download some code I used to debug and a cycle log for the execution. There is a download .exe.
(all variables are signed int default and do not have to be declared.)
I am using Altera QuartusII 9.1 .bdf for design and the resource utilization looks good less than 1% of whichever chip was auto selected. I am not very far into the design though.
Thanks for the thoughtful article. I particularly like your observation that the central capability of high-level or ESL synthesis is to "extract state from procedural code". Allowing you to write an implicit state machine instead of having to explicitly code the state vector and state transitions is a central high-level synthesis capability.
Another important thing I get from your article is that while ESL synthesis automates some of the hardware design tasks, many of them are still in the hands of a skilled designer. Successful users of HLS certainly do "architect first and design second" as you suggest, and they do pay attention to the hardware architecture that the tool infers from their code and adjust the code to get the architecture they want.
High-level synthesis is not "g++ -O99". Donít expect it to compile software into hardware. C++ coded to run on a CPU is optimized for that particular environment: a limited set of computation resources and a single linear memory. C++ coded to be implemented as custom hardware also needs to be optimized for that very different environment.
Your commenter jnhong observes that "RTL is more direct and more transparent" and concludes that "if designers need to know how yet another productivity tool works beneath the hood, why bother when they can just as quickly translate what's in their head into RTL?"
It is true that RTL is more direct and transparent in some ways than high-level code. It could also be said that gate-level schematics are more direct and transparent than RTL, or that assembly code is more direct and transparent than C++.
In all these cases, the answer to the question "why bother" is that the "extra layer of automation" allows you to produce a better design with fewer errors and less effort, and makes the design more reusable and retargetable.
A good high-level synthesis tool should be expected to accomplish this. I'm sure you can guess that I have some thoughts on where you can find one! :)
Blog Doing Math in FPGAs Tom Burke 2 comments For a recent project, I explored doing "real" (that is, non-integer) math on a Spartan 3 FPGA. FPGAs, by their nature, do integer math. That is, there's no floating-point ...