Alternatively you can extended the languages that you already use to support dataflow. You mention Verilog and VHDL which are more event-driven than dataflow, but do have an efficient approach to parallelism. I've had a go at incorporating those into C++ -
- I added a low-level "pipe" construct for asynchronous design and to support CSP style programming - aka "dataflow".
As you hint: I suspect the future of programming is going to look a lot more like hardware design, and FPGAs will be a the bleeding edge of that.
Back to the Future - CMOS logic is generally designed to swing rail-to-rail (hard on/off), analog circuits usually work in the center zone with smaller swings, differential circuits (like the old bipolar ECL - http://en.wikipedia.org/wiki/Emitter-coupled_logic) work more like analog. The reason you don't do differential in CMOS is that it's a continuous current sink and uses twice as many transistors, but if your leakage is bad anyway that's less of an issue (and if you are doing power management anyway it may be off for chunks of time too).
So I'm anticipating the return of differential logic along with the asynchronous design as folks push down past 22nm.
So now all we have to do is clock recovery of all the individual bits in the buss and then synchronize with the local clock. It is the synchronization with the local clock that gets into the metastability problem because there is no fixed time relationship between the clock domains. Sooner or later a sample will occur at the wrong time. Furthermore it is the variation in time for propagation of the individual data bits that is unknown.. If only one data wire is the case, then it is standard practice to recover the clock and drive a serdes/UART. NRZ and NRZI for parallel data use some kind of deskew and that is missing here.
If digital is an abstraction built on analog, then to run analog on the same chip and expect at least twice the frequency is a bit of a stretch, but of course we don't know what is meant by the digital frequency which is less than the analog frequency which somehow known to be higher.
The metastability arises in latches/FFs because you are sampling a signal that you expect to be a "good" 0 or 1, but it is neither - i.e. you've caught it in transition, which means you sampled it at the wrong point.
Digital logic is an abstraction that is built on top of analog components. You can try to reduce metastability by increasing gain in the circuits, but you can't avoid it completely that way. It's better to look at it as a sampled data problem where what you are trying to do is reconstruct the signal that was driven onto the other end of the wire(s).
In an asynchronous transfer the data and clock are combined (e.g. serial communication), so you have to do clock recovery and then synchronize the data with your local logic. The clock recovery part is the circuitry that needs to be running at least 2x faster than the data rate to meet the Nyquist criteria. Analog circuitry can do the job without explicit clocks (since it works at higher frequency).
Once again: "How do we parallel software?"
This issue is regurgitated again and again, ad nauseam. Hey folks, there are just two ways to parallel software:
1) Code decomposition, a.k.a. pipe-lining. In other words decomposing the processing into chunks of CODE.
2) Data decomposition. Here you decompose the DATA into chunks, and process each chunk by a separate core/processor, each one running the SAME code. (obviously, there must not be data dependencies between chunks.)
Each processor needs input memory and output memory, call them "mail-boxes." Each output mailbox of one process/task/thread is the input mailbox of another (process mind you, not prossessOR!).
Then the problem boils down to scheduling: Whenever a data block is ready in a mailbox, a free processor is assigned the task, according to priorities.
Who determines the priorities? Good question. That's why us engineers are getting paid for. You have to DESIGN the system, and make sure bottlenecks are not created. Yes, you need visualization as to where mailboxes are overflowing and why. that's a design issue. Take it up.
Ah yes, you can call it "Data-flow Architecture". It is!
As an original member of the LabVIEW development team, responsible for all the data analysis library, and having developed many digital video applications requiring parallel processing, I wrote a brief introduction to G programming to cover parallel program development. If you are interested in buying a copy visit http://www.digitalgap.com/IntroToGProgramming.php
Where does sampling enter into controlling an asychronous transfer since there are no clocks?
In digital circuits when a bistable device is hit with a pulse that is not long enough to cause a reliable state metastability occurs. I realize that metastability occurs in many systems, but this is the case that concerns me and there is no sampling involved, it either works reliably or it doesn't.
Hi Karl, if you want to check my real-world experience you can check the link at the bottom of my first post. There's more analog design than digital, but a fair amount of digital verification/simulation work.
You are correct that RTL as acronym doesn't say anything about clocks, but in practice the EDA synthesis tools are targeting a common clock implementation, which is why there there is so much time spent on designing clock trees and timing closure.
With GALS you can use higher clock speeds locally than you can achieve globally, so you can avoid the metastability issues by oversampling inputs - metastability occurs because you aren't meeting the Nyquist sampling criteria.
RTL stands for Register Transfer Level. It says absolutely nothing about how the register is clocked. The register is a register is a register. If you follow the design flow you will find that the channels and whatever else the data flow language uses will be turned into RTL before physical design.
It is absolutely obvious that the data arrival must trigger FSM state changes, what must be accounted for is the skew between data bits and the problem of metastability associated with generating clocks from data. The classical example used for metastability is an arbiter where several asynch requests are resolved by priority. If more than one request arrive close enough in time the circuit becomes unstable.
It is true that clock skew and uncertainty in path delays affect performance, but the alternative is designing glitch free logic. I have been in the real world with a lot of years of hands-on time with hardware ...What is your experience? I was around when circuit metastability was first actually demonstrated, so don't try to simply give me the glib answer that the data arrival triggers the FSM. When you are ready to explain how, I will listen.
I'd say the difference between synchronous and asynchronous at a hardware level is whether the data and its status are combined on the same wire(s), e.g. if I send you data serially and you have to recover the clock it's definitely asynchronous (or self-synchronizing), if I present a word across 64 separate wires and then give you a clock on a separate wire it's synchronous (i.e. the data and clock need to be synchronized - as in RTL).
At a design description level you can use pipe/channel constructs for data transfer so that there is no explicit clock. An asynchronous design description can be synthesized as either synchronous or asynchronous hardware. At the design level the arrival of data is the event that triggers FSM changes.
The smaller the features on Silicon the larger the spread in timing behavior (it scales inversely). So trying to use the same clock across a bunch of RTL stages will force you to go at the speed of the slowest stage, which will mean having a lot more slack than you really need. So at some point it becomes cost-effective to move to a (more) asynchronous hardware implementation where stages run at their own speed. GALS is an intermediate approach.