We've all heard reports about "latest and greatest" techniques to crack the multi-core programming problem, and we've all gotten tired of being disappointed, but I truly believe that this is "the real deal" -- this could change so many things...
Sounds really promising and I would hope that folks make the effort to at least conduct a trial of the approach. How has it been productized? Do they sell a tool, a service, license updated libraries, or what?
If I were them, I'd look for a problem that responds well to parallelism, and offer to solve it for a lower price than the present technique. Keep undercutting the competition, and pretty soon they'll notice you.
Hmm, this all sounds too good to be true, especially the claim that pretty much any application can be sped up via multithreading. Let's say an interpreter or instruction set simulator or a compiler. All inherently single threaded tasks that cannot be multithreaded. Anyone claiming otherwise is clearly selling snake oil.
Without more details it's hard to say whether this is a revolutionary new programming model that allows inexperienced programmers write 100% efficient (ok, 96%) parallel software without any effort.
@Wilco1: Hmm, this all sounds too good to be true, especially the claim that pretty much any application can be sped up via multithreading. Let's say an interpreter or instruction set simulator or a compiler. All inherently single threaded tasks that cannot be multithreaded. Anyone claiming otherwise is clearly selling snake oil.
I know what you mean -- but let me say that I didn't say "any application can be sped up via multithreading" -- what I did say was that they took on eapplication whose creator specifically said could not be multithreaded, and then dramaitcally sped it up using their technique (also that when they told me how, I slapped my head and said "D'oh!")
Now, many applications actually woudl be ameanable to true multithreading -- but making them multithreaded using traditional techniques is not a trivial task -- in this case the SVIRAL approach really scores... again, I will be able to provide more details in the not-so-distant future.
For the most part, it's a good secret agent flick. 007 is not available, so the British provide "second best" Charles Vine to protect a scientist who has developed a secret formula that every world power wants to acquire. But there's a wonderful twist which would spoil the ending if I blabbed.
- parallel programming often is hard and produces meagre results
- some folks claim they have magic pixie dust which can fix this, unlike any of the dust we have seen and tried in the past 50 years (some of which actually works and is widely used on all the low hanging fruit)
- tease, tease, tease: would you like me to tease you more.
OK Max we've plowed thru 3 pages of fluff and glimpsed plenty of ads: got anything real to tell? If not, no, not interested in repeating the exercise.
Sorry. Somebody has to say it, if your editor won't.
@TanjB: I strongly disagree with your opinion. This column was in my "TO-READ" list and I've really enjoyed the content presented in each of the three pages.
Maybe you are used to the subtleties of concurrent programming and even HW algorithm acceleration (as I am) and this stuff is obvious for you, but not everyone has this background. I think that the examples that Max has exposed are really illustratives about how hard is squeezing all the juice from a multi-core/multi-threaded architecture.
I've also enjoyed how he has exposed the real status of the Internet software and that most of the critical software building blocks are inherited from the early golden-age of open software (GNU, UNIX...).
And finally, about if this is an "ad", this is just a column presenting a new glowing technology from a start-up company. Before emiting a quick judgement and veredict, let's see if they are able to demonstrate their achievement in more applications and HW architectures: I'm frankly eager to know more about this.
Good post @Garcia-. I'd like to respond to every point so to reduce the risk of losing the readers attention, I'll do it in reverse chronological bullet format:
If we erased every EETimes article that seemed like an advertisment there would a lot less to read. Sans Junko's most welcome cold-water post about her Washer and Grill on a Date, I enter the entire trail of IoT as evidence. It reeks of a huge PR campaign. But it's all over these pages and I accept it as inevitable. Are the flashy, obvious ad-banners on the sides and tops of the articles we read the only way publications like these make money?
I share your appreciation about the exposure of the real status of Internet software. I didn't know much of that legacy story but it makes perfect sense. Intuitively, a casual observer would think that all internet infastructure would have moved along with the times, but then again, anyone in a corporate environment ought to be weary of dealing with old processes that are still in place for no other reason than reliability and familiarity. What's also interesting to me is that they're going after this market in the first place. If the whole system is disperate and kludgy then wow, what a nation-big task it'll be to find out who owns all this stuff and then convince them to pay for upgraing it? I suppose it'll involve calls to the likes of Jeff Bezos and Larry Flint.
Thwarting the poo-pooers is a needed function in discourse. Kudos to you. And that's coming from a serial poo-pooer.
@TanjB: OK Max we've plowed thru 3 pages of fluff [...] got anything real to tell?
Some stuff I'm not allowed to talk about yet, but I gathered a lot of input from comments and emails and sent it to the folks at SVIRAL and they've given me a lot of responses -- I'm travelling at the moment and up to my armpits in alligators fighting fires -- but I'll post a follow-up column with all the questions and answers when I get back.
Even from my minimal awareness, the Adobe 9 to 10.1 example seems significantly flawed. "they are no slouches on the programming front" may be true in terms of tuning software for a particular processor microarchitecture or to exploit specific instructions like the latest SIMD instructions.The frequency of security bugs in Flash might indicate that Adobe does not employ only the cream of the cream of programmers.
In addition, these programmers are likely to have a mindset less appropriate to scaling software across threads. (The frequency of security bugs would hint at this. Experts in spaghetti code can wring out exceptional single-thread performance but easily run afoul of subtle bugs.) A brilliant chess players will not trivially become good at playing go, even though both are turn-based strategy-oriented board games.
Now add that version 9 and version 10.1 may have had different feature sets such that the performance being measured is actually different. While there is some Gustafson's Law effect (adding work makes parallelism easier), this would not reduce time to completion of the main task. Furthermore if much of the actual processing is off-loaded to the GPU, then mulithreading in the CPU portion would have a dispropriately low influence on performance.
Add that Adobe had 9 versions of accumulated cruft (and that cruft being specifically considered with single-threaded operation). I doubt Adobe had the courage (or foolhardiness) to drop all the existing code and start from a fresh perspective, so significant effort must have been spent to get an irregularly worn square peg to fit in a round hole.
Early versions for a new product direction also tend to have teething issues.
In addition, it is not clear what the benchmark was testing. There is a substantial difference in available parallelism between a flash-based game and a webpage with several flash animations. In terms of such benchmarks, it would be really painful if a new version had energy-efficiency optimizations that throttled frame rate (or whatever performance metric is used) when such did not affect the user experience. I doubt Adobe's Flashplayer had such optimizations, but the concept points to how difficult it can be to make meaningful measurements.
I realize that I am nitpicking and ranting, but I have issues with the common statements that writing multithreaded software is nearly impossible. The difficulty varies with the nature of the problem (and the way the problem has been stated and previous solutions developed), the degree of parallelism sought, the skill of the programmers, etc.
"Event-driven technique" sounds like a simplified (both in reduced fraction of parallelism exploited and increased ease of development) form of tasklet-based dataflow.
(Even such coarse-grained dataflow would not exploit all the available parallelism, even operation-granular dataflow misses some parallelism. [E.g., a reduction like summing a collection of numbers can unnecessarily stall given a static pairing of operands.] However, coarser-grained dataflow is more scalable. Some of the residual parallelism can be exposed as data-level/SIMD or instruction-level parallelism [an out-of-order processor is effective a limited dataflow engine] within a tasklet, but the remaining low-quality fruit in the very top branches might not be worth picking until much better techniques are developed.)
I am skeptical that this technique will solve all the problems associated with exposing parallelism at the thread level. However, even effectively applying it to a limited portion of all programs would be useful if the transition cost is as low as you indicate. Sadly, to get enough attention to the actual benefits of new techniques, it seems the proponents must resort to hype.
(It can be even worse for revived techniques as people are inclined to think "We already know that doesn't work" even though targets and tradeoffs change and a revival of an old technique is likely to be more refined with old weaknesses corrected and new strengths devised. Technological amnesia is not necessarily entirely negative.)
There is also the constraint that the software must run on existing hardware which generally does not provide minimal-cost (in latency and energy) communication, being based on generic cache coherence. (Mapping work to caches and network nodes to minimize communication overhead is probably not addressed in the first release. Other issues are far more significant.)
@Paul: There is also the constraint that the software must run on existing hardware which generally does not provide minimal-cost (in latency and energy) communication, being based on generic cache coherence.
I have not yet been brought into the "inner circle" -- but I hope to be able to discover more and report back in the not-so-distant future.
Some 27 years back I worked on a project which software on a muliticore hardware.
The word SoC was probably not known then, and the hardware was designed to be working as some 128 individual cores.
The inter processor communication mechanism was in the hardware itself.
I did not know the term "multithreaded" at that time but the hardware seemed to be working in that fashion.
As a programmer I could decide at compile time which of those 127 cores would run a particlura piece of the software .
So theer was no over head on the inter processor communications - it was achieved by some kind of dual port memory buffers.
This hardware was a propritary design and could appropriately handle the pabx functions such as handling of trunk lines, the pabx internal lines, the attendant console and providing all kind of PABX features .
We even achieved the netwroking of multiple pabxs to achieve a lagre PABX effect over a distributed geographical area by extending the software.
So in my opinion, designing of multi-core processor SoCs has to be done with a fresh look as to how to build the interprocessor communication techniques in hardware so that the software overheads of interprocess communications in a multi threaded applications can be eliminated ( or at least minimised)
The SVIRAL "technology" link on their web pages mentions "patented technology", and a quick search of the USPTO database for patents with SVIRAL in the Assignee field came up with patent 8,782,196, a hardware scheduling mechanism that looks like a hardware scheduler based on an implementation of Dijkstra semaphore-protected buffers.
A quick Google search also shows that they got $20M of funding to make "distributed data centers" out of black (idle) cores. I smell a bit of hype in that, but we'll eventually get to see what they *really* mean.
There haven't been many new ideas in this field for a long time, but as someone pointed out, their innovation may simply be a way of making something well-known more accessible. We'll see.
The use of multi-threading in today's software is actually rampant, though often done behind the scene in libraries where average programmers don't have to deal with the full complexity. Looking at the Activity Monitor on my Mac, I see Skype with 28 threads, Itunes with 24, Arduino SDK with 23, Safari with 22, Xcode with 18, Flash Player (Safari content) with 18, Mail with 17, etc., etc. Only a dozen (all classic UNIX code) of the roughly 500 active processes have 1 thread. The only way modern computers function at all, given the difficulty of doing good multi-thread locking/protection/deadlock prevention, is for the libraries to be automating a lot of it. Hopefully that's what SVIRAL is up to -- it's hard to move the needle on skill level of all the programmers in the world.
As the patent explains, their approach is a lot more than "two additional constructs" as one might expect, and I'm under the impression that if you want to enjoy performance and true scalability you will have to heavily refactor/rewrite your code (see the "pixel" example). You have new keywords, new operators, new semantics, but all in all nothing unexpected to allow programmers to write parallel programs. By the way, when they say "multi core" they mean "heterogeneous cores", and this is even more interesting (and difficult). In a way they've created an OpenCL-like streaming language for all kinds of cores (the patent summary even includes FPGAs!)
That said, stream programming has existed for decades (as the patent's authors/SVIRAL founders themselves point out), so whether their flavor works better than past attempts remains to be seen. They also have another related patent on a specific hardware design to execute streaming programs faster: http://www.google.com/patents/US20110179252
This doesn't sound much more promising than the hype I've heard before. I think I can expect a few more years getting paid doing parallelism the old-fashioned way: Verilog.
One good problem to test parallel architectures is the N-body simulation (simulating the motion of say, hundreds of stars in a cluster). It's a good problem because at each time step each body's new velocity vector is a function of every other body's position, so it's not very easy to split the sim into separate tasks/threads.