Breaking News
Design How-To

Reduce parallel programming pain with dataflow languages

6/27/2010 09:00 PM EDT
12 comments
NO RATINGS
Page 1 / 4 Next >
More Related Links
View Comments: Oldest First | Newest First | Threaded View
Page 1 / 2   >   >>
DKC
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
DKC   6/28/2010 4:34:35 PM
NO RATINGS
You can also just add the language features of HDLs (Verilog/VHDL) to (say) C++, and get a not-so-new language that handles data-flow & event-driven programming - http://parallel.cc The main issue is that neither shared memory or synchronous (RTL) design styles work efficiently on the latest Silicon. Going forward hardware design and software design are going to start looking very similar - asynchronous communication, FSMs, and lots of threads. http://www.linkedin.com/in/kevcameron

KarlS
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
KarlS   6/28/2010 5:36:28 PM
NO RATINGS
In figure 3, what is the advantage of pipelining over simply having each core run on separate data streams? DKC: What makes RTL synchronous? HDL blocks are typically edge triggered and typically use a clock edge so that only 1 signal is involved, otherwise bad things called glitches occur when detecting the "edge" of combinatorial logic. Asynchronous data transfers require deskew of the bits which requires a time delay to wait for the slow bit and delays are not well controlled in silicon. How do you expect the FSM state chabges to be triggered? HDL is compiled to RTL before anything useful happens, so if the latest silicon does not handle RTL it is useless.

DKC
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
DKC   6/28/2010 10:39:35 PM
NO RATINGS
I'd say the difference between synchronous and asynchronous at a hardware level is whether the data and its status are combined on the same wire(s), e.g. if I send you data serially and you have to recover the clock it's definitely asynchronous (or self-synchronizing), if I present a word across 64 separate wires and then give you a clock on a separate wire it's synchronous (i.e. the data and clock need to be synchronized - as in RTL). At a design description level you can use pipe/channel constructs for data transfer so that there is no explicit clock. An asynchronous design description can be synthesized as either synchronous or asynchronous hardware. At the design level the arrival of data is the event that triggers FSM changes. The smaller the features on Silicon the larger the spread in timing behavior (it scales inversely). So trying to use the same clock across a bunch of RTL stages will force you to go at the speed of the slowest stage, which will mean having a lot more slack than you really need. So at some point it becomes cost-effective to move to a (more) asynchronous hardware implementation where stages run at their own speed. GALS is an intermediate approach.

KarlS
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
KarlS   6/29/2010 1:48:38 PM
NO RATINGS
RTL stands for Register Transfer Level. It says absolutely nothing about how the register is clocked. The register is a register is a register. If you follow the design flow you will find that the channels and whatever else the data flow language uses will be turned into RTL before physical design. It is absolutely obvious that the data arrival must trigger FSM state changes, what must be accounted for is the skew between data bits and the problem of metastability associated with generating clocks from data. The classical example used for metastability is an arbiter where several asynch requests are resolved by priority. If more than one request arrive close enough in time the circuit becomes unstable. It is true that clock skew and uncertainty in path delays affect performance, but the alternative is designing glitch free logic. I have been in the real world with a lot of years of hands-on time with hardware ...What is your experience? I was around when circuit metastability was first actually demonstrated, so don't try to simply give me the glib answer that the data arrival triggers the FSM. When you are ready to explain how, I will listen.

DKC
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
DKC   6/29/2010 5:27:01 PM
NO RATINGS
Hi Karl, if you want to check my real-world experience you can check the link at the bottom of my first post. There's more analog design than digital, but a fair amount of digital verification/simulation work. You are correct that RTL as acronym doesn't say anything about clocks, but in practice the EDA synthesis tools are targeting a common clock implementation, which is why there there is so much time spent on designing clock trees and timing closure. With GALS you can use higher clock speeds locally than you can achieve globally, so you can avoid the metastability issues by oversampling inputs - metastability occurs because you aren't meeting the Nyquist sampling criteria.

KarlS
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
KarlS   6/29/2010 8:52:34 PM
NO RATINGS
Where does sampling enter into controlling an asychronous transfer since there are no clocks? In digital circuits when a bistable device is hit with a pulse that is not long enough to cause a reliable state metastability occurs. I realize that metastability occurs in many systems, but this is the case that concerns me and there is no sampling involved, it either works reliably or it doesn't.

DrLPR
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
DrLPR   6/30/2010 12:06:42 PM
NO RATINGS
As an original member of the LabVIEW development team, responsible for all the data analysis library, and having developed many digital video applications requiring parallel processing, I wrote a brief introduction to G programming to cover parallel program development. If you are interested in buying a copy visit http://www.digitalgap.com/IntroToGProgramming.php Regards, Lalo Perez

MeirG
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
MeirG   7/1/2010 8:26:45 AM
NO RATINGS
Once again: "How do we parallel software?" This issue is regurgitated again and again, ad nauseam. Hey folks, there are just two ways to parallel software: 1) Code decomposition, a.k.a. pipe-lining. In other words decomposing the processing into chunks of CODE. 2) Data decomposition. Here you decompose the DATA into chunks, and process each chunk by a separate core/processor, each one running the SAME code. (obviously, there must not be data dependencies between chunks.) Each processor needs input memory and output memory, call them "mail-boxes." Each output mailbox of one process/task/thread is the input mailbox of another (process mind you, not prossessOR!). Then the problem boils down to scheduling: Whenever a data block is ready in a mailbox, a free processor is assigned the task, according to priorities. Who determines the priorities? Good question. That's why us engineers are getting paid for. You have to DESIGN the system, and make sure bottlenecks are not created. Yes, you need visualization as to where mailboxes are overflowing and why. that's a design issue. Take it up. Ah yes, you can call it "Data-flow Architecture". It is!

DKC
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
DKC   7/1/2010 6:03:20 PM
NO RATINGS
The metastability arises in latches/FFs because you are sampling a signal that you expect to be a "good" 0 or 1, but it is neither - i.e. you've caught it in transition, which means you sampled it at the wrong point. Digital logic is an abstraction that is built on top of analog components. You can try to reduce metastability by increasing gain in the circuits, but you can't avoid it completely that way. It's better to look at it as a sampled data problem where what you are trying to do is reconstruct the signal that was driven onto the other end of the wire(s). In an asynchronous transfer the data and clock are combined (e.g. serial communication), so you have to do clock recovery and then synchronize the data with your local logic. The clock recovery part is the circuitry that needs to be running at least 2x faster than the data rate to meet the Nyquist criteria. Analog circuitry can do the job without explicit clocks (since it works at higher frequency).

KarlS
User Rank
Rookie
re: Reduce parallel programming pain with dataflow languages
KarlS   7/2/2010 2:45:16 PM
NO RATINGS
So now all we have to do is clock recovery of all the individual bits in the buss and then synchronize with the local clock. It is the synchronization with the local clock that gets into the metastability problem because there is no fixed time relationship between the clock domains. Sooner or later a sample will occur at the wrong time. Furthermore it is the variation in time for propagation of the individual data bits that is unknown.. If only one data wire is the case, then it is standard practice to recover the clock and drive a serdes/UART. NRZ and NRZI for parallel data use some kind of deskew and that is missing here. If digital is an abstraction built on analog, then to run analog on the same chip and expect at least twice the frequency is a bit of a stretch, but of course we don't know what is meant by the digital frequency which is less than the analog frequency which somehow known to be higher.

Page 1 / 2   >   >>
Flash Poll
Radio
LATEST ARCHIVED BROADCAST
Join our online Radio Show on Friday 11th July starting at 2:00pm Eastern, when EETimes editor of all things fun and interesting, Max Maxfield, and embedded systems expert, Jack Ganssle, will debate as to just what is, and is not, and embedded system.
Like Us on Facebook

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
EE Times on Twitter
EE Times Twitter Feed
Top Comments of the Week