News & Analysis

Comment


DKC

1/4/2011 3:00 PM EST

"...it is not reasonable to expect all applications to be coded in a parallel ...

More...



KarlS

1/4/2011 1:42 PM EST

In the real world of embedded systems, the application code runs on a cpu, in ...

More...

Evolution of design methodology II: The re-aggregation era

Paul McLellan

12/20/2010 7:17 PM EST

Editor's note: This is the second of a two part opinion piece authored by EDA luminaries Jim Hogan and Paul McLellan. The first installment was posted Nov. 24.

Unlike previous changes to the abstraction level of design, the block level not only goes down into the implementation flow, but also goes up into the software development flow. Software and chip-design must be verified against each other. Since the purpose of the chip is to run the software load, it can't really be optimized any other way.

There is, today, no fully-automated flow from the block level all the way into implementation. A typical chip will involve blocks of synthesizable IP typically in Verilog, VHDL or SystemVerilog along with appropriate scripts to create efficient implementations. Other blocks are designed at a higher level, or, perhaps pulled from the software for more efficient implementation. These blocks are in C, C++ or SystemC. The key technology here is high-level synthesis (HLS). This provides the capability to reduce system behavioral models to SoC almost automatically.

Designs like this are really very difficult to verify efficiently due to the inevitable mixture of languages and accuracy. Large FPGAs are the medium of choice: they can accept this mixture and they are fast enough to run a large verification load. FPGAs have another advantage in that they introduce no silicon variance. They are by definition already silicon proven.

Going up from the block level allows a virtual platform to be created. The big challenge here is transitioning enough blocks so that fast hardware models exist with fidelity, for otherwise the delay and effort to do the modeling makes the software development schedule unacceptable.

Virtual platforms, and some other hardware-based approaches such as emulation, straddle a performance chasm. Software developers require performance millions of times faster than is appropriate for chip design. Of course at some level, if the technology were available, everyone would like high accuracy and high performance. We would all use Spice all the time if it ran faster than RTL but it is impossible to do that. Instead, performance is purchased by throwing away accuracy.

However, it is still necessary to be able to move up and down this stack dynamically: boot Linux at high performance (seconds not hours), and then drop to a higher level of accuracy to run a couple of frames to a display processor to check the hardware functions correctly. Run fast until just before a bug seems to occur, then drop down and investigate what is really going on. High performance or high accuracy is not good enough, both are required: the software performance model doesn't have enough accuracy to debug the system hardware and the slower models can only boot Linux on a geological timescale.

This approach—the block level IP integration with virtual platforms—considerably shortens the number of steps between simply expressing design intent and actually having working hardware and software. This change enables design creation once again to move to the electronic system company, where the most important knowledge—the system knowledge—is found. Implementation is then either directly in FPGAs for many systems that are relatively low volume and high software value, or, once FPGAs have been used for prototyping, transformed into silicon and manufactured in one of the big foundries, largely bypassing the previous generation of semiconductor companies focused on trying to produce a one-size-fits-all standard product.

There are several technologies that seem to now be mature enough to enable this transition to software-centric block-level design, along with companies supplying them.





KarlS

12/21/2010 11:40 AM EST

Here's something that I need a little help with: HLS takes code to silicon and is a key factor. If I want to integrate IP into a system, then what I really want is to integrate the IP function into software. So parse the HDL (Verilog is easy) and produce code to use in the software development. The code is to run on a processor, but the processor itself is not being verified. Treat the IP the same way, verify IP separately and use it.
Mainframe computer systems have been built by connecting processors, channels(DMA), Control units(IP), and external devices for about fifty years now. SOC is analogous.

Sign in to Reply



DKC

12/30/2010 3:25 AM EST

Most of the C/C++ based ESL stuff fails in that it is a hardware-centric methodology as is programming in Verilog (as suggested by Karl). Software engineers don't like that kind of stuff much because the code isn't sufficiently abstract to be portable/reusable - aside from the fact the HDLs are ugly and dysfunctional.

The opportunity for aggregation lies in the fact that programming a 1000-core system and designing logic for SoC are very similar exercises once you get away from SMP and RTL. I.e. a programming methodology for fine-grained parallelism will work equally well for general purpose processors, FPGAs, GP-GPUs and ASICs (with HLS).

http://parallel.cc


Sign in to Reply



KarlS

12/30/2010 10:12 AM EST

Yes, HDL's are ugly because they are a description/definition language and to some extent reflect some of the ugly things that have to be accounted for in the physics of the chip. HLS and multi-core in a very simple sense assume an unlimited number of registers (variables) that can be selected as inputs to a core(cpu/aithmetic/logicunit/DSP block) with no delay, and the result put back into any register. So much for abstraction: all of those things have to be connected by wires, dissipate power, and take up space. The dirty part is that physics is real, not abstract and must be applied. Multi-core is really SMP in that all the instructions and data must be supplied by the memory hierarchy which is limited by physics -- that UGLY word again! The old approach of "just get more memory and a faster cpu" won't cut it. It did not work for super computers either, so they went out and found some problems that could be solved by an array of processors.

Sign in to Reply



DKC

1/2/2011 3:59 AM EST

I'm glad you agree HDLs are ugly, however it's not really a requirement. Having worked in simulation for decades I can say most of the ugliness is due to bad design (by committee) and a desire to support (bad) legacy approaches. Also, hardware guys are not that good at finding the right abstractions.

An ESL language really needs to be a programming language which is HLS friendly and usable as an HDL. Verilog/VHDL fail on the former, C/C++ on the latter. SystemVerilog is particularly bad since the committee could have helped bridge the SV/SystemC gap but completely failed to do so.

Inspired by my SV experience I decided the way to go was to add the HDL threading model to C++. The project is on-going, and you can find it at -

http://parallel.cc

I will be adding analog support at a later date.

Sign in to Reply



KarlS

1/2/2011 11:32 AM EST

I did go to the web site. No matter what the ESL is, bad design is still bad design. Also, a bad programmer can create a mess with any language. If the hardware guys are not good with abstractions, does that mean the right ESL will replace them? I think that is the underlying idea. So my conclusion is that programmers and the right ESL are all that's needed. The dream lives on. The topic that "Sequential Programming is so Last Century" caught my eye. But, ParC is C++ with HDL threading. Where is the parallel programming?

CSP is the way to go and at first I thought Google's Go language would be usable, but they refuse to consider direct access to MMIO registers so all the "sharing by communicating" is unavailable.

My feeling is that several tiny cpu's in an CSP configuration controlling IP peripherals is simple and doable today. Altera's SOPC Builder follow on Qsys looks pretty good. Amdahl's Law applies to multi-core and when things don't work right with everything inside a chip, who has the skill to debug that maze?

Sign in to Reply



DKC

1/2/2011 3:01 PM EST

"But, ParC is C++ with HDL threading. Where is the parallel programming?"

The parallelism in ParC is the same as in Verilog/VHDL, each "process" block is a separate thread that can be run in parallel. The assignment of threads to cores can be explicit or automatic - some of the tests have a call to "migrate(...)" which does a semi-automatic move. The communication mechanisms (signals & pipes) are opaque to the programmer, the runtime system looks after them so threads can be moved easily.

In order for code to be reusable long-term you want to write it to take advantage of as many cores as it can. Amdahls law does not necessarily apply because what it really says is that for a given sequential algorithm if you start pulling it apart and adding communication on top of the existing processing then you will get diminishing returns. If you write a specifically parallel algorithm then you can go lots faster - or to look at another way: If you have an MPEG decoder written in C for X86 and an MPEG decoder written in Verilog, the latter is a parallel description which is a lot more complicated - but a lot faster/more-efficient when compiled for the right platform (FPGA/ASIC). ParC is about making the latter description easier to write so that you can also run it efficiently on multi-core (as well as FPGA/ASIC).

Debugging ParC/CSP is pretty much the same as debugging Verilog/VHDL, folks like SpringSoft have tools for that, and you can use formal methods and assertion-driven testing.

Sign in to Reply



KarlS

1/4/2011 11:41 AM EST

There are two things that need to be separated hardware design and system design. System design takes hardware that has an interface defined by MMIO registers and their implied functions and connects the application software. Generally there is a device driver that actually manipulates the MMIO regs and has a software interface to either an IOS or native code. During design the software(application/OS/driver) must only work to that interface and may use all the available computer science. On the hardware side the MMIO functions have to be translated into physical circuitry. If there is a driver used for some of the hardware, then the driver software interface replaces the MMIO.

The simulator/model/virtual prototype must focus on timing other than the traditional hardware waveform. Those waveforms can be used for pretty good hardware timing(it's the only game in town). That's the real time aspect that matter's. The other hardware function is the interrupt request which signals when the hardware needs to be serviced. Since there is software, there must be a cpu and the response time is usually so variable that it should be considered unknown. The idea of having multiple cpus {IO processors} dedicated to a few peripherals is a possible logical solution, but unaffordable. Using C++ aggravates the variability if object instantiation occurs dynamically along with garbage collection and the fact that compiler optimization throws out code that write only MMIO regs.

Sign in to Reply



DKC

1/4/2011 12:48 PM EST

"Since there is software, there must be a cpu"

- not really, it depends on what the code does. HLS will turn code into RTL for you.

Using C++ from top to bottom of the design flow - i.e. replacing Verilog/VHDL - makes life a lot easier, particularly for developing drivers.

Garbage collection is a Java thing rather than C++.

Sign in to Reply



KarlS

1/4/2011 1:42 PM EST

In the real world of embedded systems, the application code runs on a cpu, in Utopia the application may get synthesized from the abstract.

Garbage collection is not unique to Java. If there is no garbage collection then that horrible thing dubbed "memory leak" may occur.

Driver development is part of the hardware design and provides a software interface to allow for re-usability.

If the process blocks that run on separate threads equate to Verilog always blocks, then HDL must exist first. In any case, it is not reasonable to expect all applications to be coded in a parallel rather than sequential fashion.

Sign in to Reply



DKC

1/4/2011 3:00 PM EST

"...it is not reasonable to expect all applications to be coded in a parallel rather than sequential fashion."

Given that the trend is to more-and-more cores and possibly a mix of SMP, FPGA, and GP-GPU, I'd say anybody who thinks that they can keep going with sequential code or low-core-count SMP is not going to be around in a decade.

HDLs exist as they are for historical reasons. One of the aims of my ParC project is to support legacy code by translation, i.e. any old Verilog/VHDL should be translatable into ParC going forward. New tools would be written to work with ParC (in ParC), and older tools can be migrated to take advantage at fairly low cost. Eventually the HDLs will become unnecessary.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Featured Job On
Scroll for More Jobs

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)