News & Analysis
Comment
I_O
In 1995 I was working on a system of some 300 parallel DSP processors used to ...
sharps_eng
Plenty of parallel (mutiple-threaded) code runs today, but on single or a few ...
Ten quotes on parallel programming
Peter Clarke
11/4/2011 6:38 AM EDT
The second five quotes
"There are other pitfalls; concurrent code that is completely safe but isn’t any faster than it was on a single-core machine, typically because the threads aren’t independent enough and share a dependency on a single resource." Herb Sutter, chair of the ISO C++ standards committee, Microsoft.
"My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware -- not the traditional hardware first mentality." Tim Mattson, principal engineer at Intel.
"We stand at the threshold of a many core world. The hardware community is ready to cross this threshold. The parallel software community is not." Tim Mattson, principal engineer at Intel.
"The wall is there. We probably won't have any more products without multicore processors [but] we see a lot of problems in parallel programming." Alex Bachmutsky, chief architect at Nokia Siemens Networks.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." Brian Kernighan, professor at Princeton University.
While the many-core challenge is clear and immediate we also have to start looking even higher than many-core. There is AMD's work on an open Fusion System Architecture and beyond that the need to start thinking about how to design, in a top-down way, swarms of wirelessly connected devices as outlined by Professor Alberto Sangiovanni-Vincentelli, of the University of California, Berkeley electronic engineering and computer science department (see IEF: Group behavior is the design, says ASV).
"There are other pitfalls; concurrent code that is completely safe but isn’t any faster than it was on a single-core machine, typically because the threads aren’t independent enough and share a dependency on a single resource." Herb Sutter, chair of the ISO C++ standards committee, Microsoft.
"My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware -- not the traditional hardware first mentality." Tim Mattson, principal engineer at Intel.
"We stand at the threshold of a many core world. The hardware community is ready to cross this threshold. The parallel software community is not." Tim Mattson, principal engineer at Intel.
"The wall is there. We probably won't have any more products without multicore processors [but] we see a lot of problems in parallel programming." Alex Bachmutsky, chief architect at Nokia Siemens Networks.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." Brian Kernighan, professor at Princeton University.
While the many-core challenge is clear and immediate we also have to start looking even higher than many-core. There is AMD's work on an open Fusion System Architecture and beyond that the need to start thinking about how to design, in a top-down way, swarms of wirelessly connected devices as outlined by Professor Alberto Sangiovanni-Vincentelli, of the University of California, Berkeley electronic engineering and computer science department (see IEF: Group behavior is the design, says ASV).
Navigate to related information


wave.forest
11/4/2011 8:01 AM EDT
I like this: - "I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity." Donald Knuth, professor emeritus at Stanford. -
Sign in to Reply
nosubject
11/4/2011 12:17 PM EDT
This is the reality. "My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware -- not the traditional hardware first mentality."
But this reality is opposite to the regular standardization process which need top-down then finally bottom-up process to characterize the chip architectures and features, [especially for multi-player markets].
Sign in to Reply
EREBUS
11/4/2011 5:42 PM EDT
If you do not understand your application then it just does not matter how many processors you through at it.
Parallel processing is no different from multi-tasking. You need to know what you are doing and when you need to do it.
QED
Sign in to Reply
cdhmanning
11/5/2011 8:32 PM EDT
I think parallel processing is a bit different to multi-tasking.
The purpose of multi-tasking is to simultaneously enable multiple independent tasks so that they can proceed on one computer. There is a need to protect shared resources etc.
Parallel processing takes a single task and tries to break it into smaller tasks to exploit multiple processors.
As an example, consider the difference in the way make can work.
Multi-tasking allows two seperate makes to progress in parallel building two projects.
Multi-processing allows multiple threads within the building of a single project to allow that build to proceed faster.
Design for multi-processing is completely different than for multi-tasking.
Sign in to Reply
betajet
11/4/2011 7:15 PM EDT
Four other quotes come to mind. (4) is an example of an embarassingly parallel problem. (1) is an example of the opposite.
1. "If one man can dig a post-hole in 60 seconds, how long will it take 60 men working together to dig a post-hole?" (Ambrose Bierce)
2. Definition of "peak performance": a guarantee that your parallel computer will not run any faster than this. (I don't remember source)
3. "Never take advice on parallel computing from a hardware guy." (I forget the source and exact wording.)
4. Famous example of the difficulty of French spelling: "Si six scies scient six cyprès, six cent scies scient six cent cyprès", which means "if six saws saw six cypress trees, 600 saws saw 600 cypress trees". In French, the words "si", "six", "scies", and "scient" are all pronounced the same, creating a nasty dictation exercise.
Sign in to Reply
rogerrobie68
11/7/2011 12:54 PM EST
Those are even better
Sign in to Reply
agk
11/5/2011 2:57 AM EDT
It can be parallel processing,multi core, thread etc.We see now our computers , cell phones are faster than before and that is a fact. They are all working.
Sign in to Reply
bah
11/6/2011 10:56 PM EST
Sutter, not Suttler.
Sign in to Reply
peter.clarke
11/7/2011 5:10 AM EST
@bah
Thanks, correction made
Sign in to Reply
Marco Jacobs
11/7/2011 7:26 AM EST
Peter: thanks for re-publishing the quotes and the credit.
@bah: Sorry for the typo, we'll correct.
Sign in to Reply
KB3001
11/7/2011 11:11 AM EST
"Everybody who learns concurrency thinks they understand it, ends up finding mysterious races they thought weren’t possible, and discovers that they didn’t actually understand it yet after all."
Sums up my experience. You get better at it with time but there is no room for complacency!
Sign in to Reply
adrianvons
11/14/2011 9:42 AM EST
I think the next quote will help Herb Sutter understand the situation.
'Experience is the name everyone gives to his mistakes' - Oscar Wilde
Sign in to Reply
Eric Verhulst_Altreonic
11/7/2011 3:47 PM EST
Parallel processing is not that difficult. Software is modelling and the world is concurrent by nature. But what is difficult is to undo the brainswashing of a sloppy education in software engineering that starts bottom up (from the hardware to the software: I call that the von Neumann syndrome), hence the two conflicting quotes from Tim Mattson, Intel.
We have a concurrent programming model that covers for a single multi-tasking processor to a networked system with (theoretically millions of nodes, even heterogeneous) in just 5 KB per node. Difficult? No, because it is based on a formal development and borrows from CSP (the Communicating Sequential Processes process algebra of Hoare). It's actually very natural. But where do they still teach CSP?
Sign in to Reply
adrianvons
11/14/2011 9:48 AM EST
'Life is not complex. We are complex. Life is simple and the simple thing is the right thing.' again good old Oscar Wilde.
Anyway, I think there is time for a paradigm shift!
Sign in to Reply
Sheetal.Pandey
11/8/2011 2:27 AM EST
i like this "My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware -- not the traditional hardware first mentality." Tim Mattson, principal engineer at Intel.
Sign in to Reply
jackOfManyTrades
11/8/2011 3:15 AM EST
I think it would have been better called: "Five quotes on parallel programming". The second five are a bit weak.
Sign in to Reply
peralta_mike
11/8/2011 8:54 AM EST
We should take a lesson from the real world. There are about 3 supervisor for every worker.
The supervisors fight amongst each other (this is where all the parallelism happens) to see how much of the worker's time they can get. So we need 3 cores for every "working core" - that actually does the non-parallel task. In this way there are no conflicts within the "working core" which does all the actual work. The other 3 cores are just used to sort out all the parallelism.
Sign in to Reply
Semiconductor Design Engineer
11/9/2011 5:02 PM EST
OMGosh, that's brilliant :-), and thanks for making me laugh so hard I shot hot coffee out my nose :-)
Sign in to Reply
fundamentals
11/8/2011 10:10 AM EST
I like Kernigan's quote the best. It goes to the heart of what is most difficult about parallel programming: debugging. I don't listen to the arm-chair-experts who keep telling us how to write code with accurate modelling so that the program will be bug free. That sort of thing is just wishful thinking. No large program that I know of is bug free, and almost all of them are sequential programs. Now try designing a bug free concurrent program. If it is large enough, it will have bugs (most likely many more than an average sequential program.) Then get ready for your hardest task ever: debugging it!
Sign in to Reply
ooferwog
11/8/2011 3:01 PM EST
Brian Kernighan certainly has Microsoft pegged.
Hi Herb!
Tex
Sign in to Reply
ooferwog
11/8/2011 3:25 PM EST
"The wall is there. We probably won't have any more products without multicore processors [but] we see a lot of problems in parallel programming." Alex Bachmutsky, chief architect at Nokia Siemens Networks.
Which reminds me of Microsoft's short-lived ad campaign whose tag line was: "Imagine life - without walls."
Now, I don't think Mr. Bachmutsky's 'wall' was the same sort of wall M$ had in mind (lord knows for them I'm still it's still over the horizon), but I must say that the very first thing that came to mind when I saw Microsoft's ridiculous advert on that billboard was this:
"Look, Microsoft, if you don't have walls then what possible use would you have for Windows?"
I wrote Microsoft and asked about it, but no one replied. Imagine my surprise.
Then, not long ago Microsoft released a product which they dubbed 'Azure.' Remind you of anything? Azure? Blue? Blue, as in "Blue Screen of Death?"
I think there's a Linux mole in Marketing.
Tex
Sign in to Reply
wave.forest
11/9/2011 9:51 AM EST
It appears no one has mentioned "data dependency" which kills parallel computing because it is too much for a mediocre programmer to take it on.
The hard part of parallel computing is to break a data dependent sequence problem into multiple data independent pieces to leverage multiple cores.
"I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity." Donald Knuth, professor emeritus at Stanford
Sign in to Reply
RoweBots1
11/11/2011 9:44 AM EST
This is a good comment. In image processing often this is very clear by cutting the image into sections; however in many problems the view of early parallel programmers seems more appropriate:
Paraphrased it was: "parallel algorithms for the real world are like so much smoke" - creators of Illiac IV, a 64 processor SIMD machine build in 1969.
I think that many problems become serial and this is the limitation of multicore. System designers and programmers can think and debug in parallel when natural parallelism exists. They can rise to the occasion and put multiple data processing streams on a shared multicore platform. What they can't do is take a clearly sequential problem and make it parallel. This has been the barrier for the past 30 years.
Sign in to Reply
wave.forest
11/15/2011 3:38 PM EST
Thanks! You know what I meant.
Sign in to Reply
KarlS
11/9/2011 10:30 AM EST
Then there's the overlooked fact that there's not enough "processing" to benefit from multi-things in many situations. In a decision/logic tree there can be many initial decisions that are very simple where results must be combined in order to proceed to the next level. Dispatching a task or thread to test if a number is less then zero probably has ten times as much overhead as the compare and to do it to 10 processors for 10 compares wastes a lot of power and time. Here is where a chip can beat the processor by simply testing the sign bit of a register as an instance.
Sign in to Reply
sharps_eng
11/10/2011 4:39 PM EST
Good question. CSP does look a bit ugly but not so bad as C. I guess it needs a makeover like occam got in its incarnation as a hardware language wrapped up in C. Better yet, a graphical (3D?) programming toolset.
Sign in to Reply
RoweBots1
11/11/2011 9:50 AM EST
One benefit of the array computing approach is that natural parallelism can be exploited, albeit in an inefficient way. The difficulty with this approach is that it fails to deal with I/O. Possibly some mix of this approach with traditional approaches are required. I would note that in many ways the array computing approach maps into CSP.
The real problem is that we think sequentially, understand natural parallelism, and have difficulty being creative enough to turn serial streams of processing into parallel algorithms.
Sign in to Reply
Eric Verhulst_Altreonic
11/11/2011 4:36 PM EST
My own since 20 years:
Design parallel, optimize by sequentialisation.
Software is modelling and most of the things we model are inherently concurrent (or parallel). Almost anything can be expressed as a set of "Interacting Entities".
The reason we can get more out of the computer by sequentialisation is because the hardware in inherently designed to be sequential. I call that the von Neuman syndrome. To get most out of 2 or more processors, the program must be designed as a parallel one. Getting parallelism out of a sequential program is really an exercise in reverse engineering.
Sign in to Reply
igrbt
11/11/2011 11:20 PM EST
I known phenomena: back in 60s it was discovered that adding more processors on top of 4 did not scale linearly e.g. if we'd expect that throughput would quadruple on 4w CPU system it would only increase by fraction on 4+ system.
Overhead of system processing + intrinsic sequential nature many typical algorithms would prevent having this from benefiting. Our modern multi-core processors can simply increase throughput if programmed accordingly
Sign in to Reply
wave.forest
11/13/2011 2:46 PM EST
I like "programmed accordingly" ;)
How? Try M^E mod N - one of the most used functions to see how far you can go.
Sign in to Reply
igrbt
11/13/2011 6:51 PM EST
Certain(many) algorithms cannot be "parallelized" including RSA generator related. I should perhaps used word MAY instead of CAN (as far as throughput goes)
Sign in to Reply
wave.forest
11/15/2011 12:58 PM EST
:)
This is the hard part regarding parallel computing: How to transform a seemingly hardcore sequential problem into a parallel problem?
For RSA decipherment, it's well known to use CRT. For exponentiation, you may try Montgomery Ladder. So if you have 4 cores and program them accodingly, you may have 12x to 16x throughput increase.
;)
Sign in to Reply
DU00000001
11/14/2011 12:50 PM EST
I love quote 4: "... ends up finding mysterious races they thought weren’t possible ..."
I've assisted in identifying quite too many race conditions during the last five years.
Sign in to Reply
sharps_eng
11/15/2011 6:34 PM EST
Plenty of parallel (mutiple-threaded) code runs today, but on single or a few cores, timeshared to simulate parallelism. This is fine when the processors would otherwise be spinning waiting for shared resources or slow I/O. When there is a real performance bottleneck like searching, codecs or graphics you need extra hardware, cores or logic, no software will help unless the algorithm is improved.
Meanwhile I suggest we use some of those spare millions of gates to provide proper transparent hardware memory virtualization so that any thread can allocate or free memory any way with no performance, fragmentation or garbage-collection concerns. That would be truly useful in a chipset.
Sign in to Reply
I_O
6/26/2012 4:50 AM EDT
In 1995 I was working on a system of some 300 parallel DSP processors used to analyse acoustic data from multiple sources. This was a single purpose system with steady state data flows from one processor to the next. The system worked because the total processing algorithm could be split into parts with deterministic communications between processors.
Trying to replicate this with any more general purpose processing function is the stuff of insanity
Sign in to Reply