Some 27 years back I worked on a project which software on a muliticore hardware.
The word SoC was probably not known then, and the hardware was designed to be working as some 128 individual cores.
The inter processor communication mechanism was in the hardware itself.
I did not know the term "multithreaded" at that time but the hardware seemed to be working in that fashion.
As a programmer I could decide at compile time which of those 127 cores would run a particlura piece of the software .
So theer was no over head on the inter processor communications - it was achieved by some kind of dual port memory buffers.
This hardware was a propritary design and could appropriately handle the pabx functions such as handling of trunk lines, the pabx internal lines, the attendant console and providing all kind of PABX features .
We even achieved the netwroking of multiple pabxs to achieve a lagre PABX effect over a distributed geographical area by extending the software.
So in my opinion, designing of multi-core processor SoCs has to be done with a fresh look as to how to build the interprocessor communication techniques in hardware so that the software overheads of interprocess communications in a multi threaded applications can be eliminated ( or at least minimised)
"Event-driven technique" sounds like a simplified (both in reduced fraction of parallelism exploited and increased ease of development) form of tasklet-based dataflow.
(Even such coarse-grained dataflow would not exploit all the available parallelism, even operation-granular dataflow misses some parallelism. [E.g., a reduction like summing a collection of numbers can unnecessarily stall given a static pairing of operands.] However, coarser-grained dataflow is more scalable. Some of the residual parallelism can be exposed as data-level/SIMD or instruction-level parallelism [an out-of-order processor is effective a limited dataflow engine] within a tasklet, but the remaining low-quality fruit in the very top branches might not be worth picking until much better techniques are developed.)
I am skeptical that this technique will solve all the problems associated with exposing parallelism at the thread level. However, even effectively applying it to a limited portion of all programs would be useful if the transition cost is as low as you indicate. Sadly, to get enough attention to the actual benefits of new techniques, it seems the proponents must resort to hype.
(It can be even worse for revived techniques as people are inclined to think "We already know that doesn't work" even though targets and tradeoffs change and a revival of an old technique is likely to be more refined with old weaknesses corrected and new strengths devised. Technological amnesia is not necessarily entirely negative.)
There is also the constraint that the software must run on existing hardware which generally does not provide minimal-cost (in latency and energy) communication, being based on generic cache coherence. (Mapping work to caches and network nodes to minimize communication overhead is probably not addressed in the first release. Other issues are far more significant.)
Even from my minimal awareness, the Adobe 9 to 10.1 example seems significantly flawed. "they are no slouches on the programming front" may be true in terms of tuning software for a particular processor microarchitecture or to exploit specific instructions like the latest SIMD instructions.The frequency of security bugs in Flash might indicate that Adobe does not employ only the cream of the cream of programmers.
In addition, these programmers are likely to have a mindset less appropriate to scaling software across threads. (The frequency of security bugs would hint at this. Experts in spaghetti code can wring out exceptional single-thread performance but easily run afoul of subtle bugs.) A brilliant chess players will not trivially become good at playing go, even though both are turn-based strategy-oriented board games.
Now add that version 9 and version 10.1 may have had different feature sets such that the performance being measured is actually different. While there is some Gustafson's Law effect (adding work makes parallelism easier), this would not reduce time to completion of the main task. Furthermore if much of the actual processing is off-loaded to the GPU, then mulithreading in the CPU portion would have a dispropriately low influence on performance.
Add that Adobe had 9 versions of accumulated cruft (and that cruft being specifically considered with single-threaded operation). I doubt Adobe had the courage (or foolhardiness) to drop all the existing code and start from a fresh perspective, so significant effort must have been spent to get an irregularly worn square peg to fit in a round hole.
Early versions for a new product direction also tend to have teething issues.
In addition, it is not clear what the benchmark was testing. There is a substantial difference in available parallelism between a flash-based game and a webpage with several flash animations. In terms of such benchmarks, it would be really painful if a new version had energy-efficiency optimizations that throttled frame rate (or whatever performance metric is used) when such did not affect the user experience. I doubt Adobe's Flashplayer had such optimizations, but the concept points to how difficult it can be to make meaningful measurements.
I realize that I am nitpicking and ranting, but I have issues with the common statements that writing multithreaded software is nearly impossible. The difficulty varies with the nature of the problem (and the way the problem has been stated and previous solutions developed), the degree of parallelism sought, the skill of the programmers, etc.
For the most part, it's a good secret agent flick. 007 is not available, so the British provide "second best" Charles Vine to protect a scientist who has developed a secret formula that every world power wants to acquire. But there's a wonderful twist which would spoil the ending if I blabbed.
@Wilco1: Hmm, this all sounds too good to be true, especially the claim that pretty much any application can be sped up via multithreading. Let's say an interpreter or instruction set simulator or a compiler. All inherently single threaded tasks that cannot be multithreaded. Anyone claiming otherwise is clearly selling snake oil.
I know what you mean -- but let me say that I didn't say "any application can be sped up via multithreading" -- what I did say was that they took on eapplication whose creator specifically said could not be multithreaded, and then dramaitcally sped it up using their technique (also that when they told me how, I slapped my head and said "D'oh!")
Now, many applications actually woudl be ameanable to true multithreading -- but making them multithreaded using traditional techniques is not a trivial task -- in this case the SVIRAL approach really scores... again, I will be able to provide more details in the not-so-distant future.
Hmm, this all sounds too good to be true, especially the claim that pretty much any application can be sped up via multithreading. Let's say an interpreter or instruction set simulator or a compiler. All inherently single threaded tasks that cannot be multithreaded. Anyone claiming otherwise is clearly selling snake oil.
Without more details it's hard to say whether this is a revolutionary new programming model that allows inexperienced programmers write 100% efficient (ok, 96%) parallel software without any effort.