@HankWalker: I'm glad to hear some comments from an old-school Caltech student. Today, this institution continues the leading research in asynchronous logic with people such as Alain Martin -- I know he is not a youngster, but still kicking ;-)
About your comments, I totally agree with your explanation about throughput. In addition to the classic laundry example, that explains the case for a straight-forward pipeline, we should consider that micropipes can be used in more complex designs including fork and funnel structures. e.g. if they are applied to a pipelined processor in which the ALU includes different ops, each one with different processing times, the throughput has the potential of being higher than the synchronous counterpart depending on the instructions distribution and order.
In addition, when applying delay insensitive or quasi-delay insensitive approaches to designing micropipes, we can actually reach higher throughputs as we don't need to make any worst-case timing assumption based on process/temperature/voltage variations: they just run as fast as they can -- of course, we could theoretically run a faster synchronous pipe when full data load, but in practice we would face very important clock issues when calculating the optimal period.
About GALS, by following the included link, you'll find a previous blog covering the clocking issues that lead to this compromise solution (synchronous islands plus asynchronous network-on-chip communications).
Finally, about the elastic/inelastic classification, I've directly taken it from Ivan's ACM Turing's Award lecture -- the blog includes the link to this precious gem too. Quoting him:
"Some pipelines are inelastic; the amount of data in them is fixed. The input rate and the output rate of an inelastic pipeline must match exactly. Stripped of any processing logic, an inelastic pipeline acts like a shift register. Other pipelines are elastic; the amount of data in them may vary. The input rate and the output rate of an elastic pipeline may differ momentarily because of internal buffering."
Ivan was starting to cover this material when I was an undergrad at Caltech in the late 1970s. The micropipeline throughput is still limited to the same speed as the synchronous pipe, since they are both limited by the slowest stage. Consider a washer and dryer. You can pile up wet laundry on top of the dryer (micropipe), but it will still have to wait for the dryer. What the micropipe does provide is elastic buffering if the pipe input load is bursty, achieving higher throughput since the pipe source will not have to stall.
You have not discussed micropipe disadvantages. The loopback time of request/ack signaling adds a stage delay overhead that is not present in a synchronous clock design, so the synch pipe can potentially have a shorter cycle time. But this must be balanced against the effort (power/area) required to distribute a synch clock. This is why GALS is proposed - a synch clock is better over small regions and asynch over larger regions.
I think "elastic FIFO" is a redundant term. A FIFO that is not elastic is pointless.