As an old timer, one of the most frustrating things is to have to wait for synthesis and or placement to run first. Early in the design, it is the logic that matters, not whether it has been optimized or whether everything feeds an output pine, else it gets synthesized away. I have proposed to map the HDL into a set of generalized classes to be used in an OOP simulator so the hardware function can be used in the software development environment. This way, the existing IP as well as the new design all come together in a program IDE where the debug and compiler checking is much better than in the hardware design flow. Why not take an evolutionary approach such as what this article suggests without all the uncertainties of HLS?
After all when HLS matures and obsoletes HDL that too can be used in a similar way.
As a technical communicator writing while the IC architecture team worked with the RTL development and design teams, I encountered the frustrations the author, Odiz, seeks to prevent. This article is an excellent recommendation to improving IC development/design--making sure the teams all around the world compile, aggregate, and build/test designs implications more quickly. One of the solutions my company looked at was company-wide (global) use of an executable specification. Our teams wrote scripts to test the RTL code against the design recommendations. We were able to detect early on the changes that we needed to make before it hit a critical path.
Another aspect is that it is more important to get a big complex design to work than it is to optimize every path and every Boolean function.
Forty years ago the opposite was true.
Another thing is embedded processors are programmed in a high level language, but the compiling process breaks the code down into such tiny steps that it takes millions of cycles to do any meaningful function. Things like pipelining, branch prediction, multilevel caches are assumed to be absolute necessities but they are not. The typical if/else. for, while, do are just structured ways of expressing compare and jump constructs and assignments can be streamlined by putting them into Reverse Polish Notation. Then the cpu becomes a very small physically, and use of multiport memories allows the statements to execute in very few cycles. The result is that function is coded in high level language, code size is small, power is low because thousands of flipflops do not have to be clocked simultaneously. Changes are done by altering memory content rather than redoing the entire chip compilation and timing. One concern might be that a total chip design may be a few cycles faster, but effectively doing more function per cycle offsets that effect.