My thanks for compiling these ten quotes go to Marco Jacobs, marketing director of Vector Fabrics BV (Eindhoven, The Netherlands), who listed them in his most recent blog.
"For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers." Gene Amdahl in 1967.
"The way the processor industry is going, is to add more and more cores, but nobody knows how to program those things. I mean, two, yeah; four, not really; eight, forget it." Steve Jobs, Apple.
"I decided long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen to them, not me, for guidance on how to deal with simultaneity." Donald Knuth, professor emeritus at Stanford.
"Everybody who learns concurrency thinks they understand it, ends up finding mysterious races they thought weren’t possible, and discovers that they didn’t actually understand it yet after all." Herb Sutter, chair of the ISO C++ standards committee, Microsoft.
"Redesigning your application to run multithreaded on a multicore machine is a little like learning to swim by jumping into the deep end." Herb Sutter, chair of the ISO C++ standards committee, Microsoft.
In 1995 I was working on a system of some 300 parallel DSP processors used to analyse acoustic data from multiple sources. This was a single purpose system with steady state data flows from one processor to the next. The system worked because the total processing algorithm could be split into parts with deterministic communications between processors.
Trying to replicate this with any more general purpose processing function is the stuff of insanity
Plenty of parallel (mutiple-threaded) code runs today, but on single or a few cores, timeshared to simulate parallelism. This is fine when the processors would otherwise be spinning waiting for shared resources or slow I/O. When there is a real performance bottleneck like searching, codecs or graphics you need extra hardware, cores or logic, no software will help unless the algorithm is improved.
Meanwhile I suggest we use some of those spare millions of gates to provide proper transparent hardware memory virtualization so that any thread can allocate or free memory any way with no performance, fragmentation or garbage-collection concerns. That would be truly useful in a chipset.
This is the hard part regarding parallel computing: How to transform a seemingly hardcore sequential problem into a parallel problem?
For RSA decipherment, it's well known to use CRT. For exponentiation, you may try Montgomery Ladder. So if you have 4 cores and program them accodingly, you may have 12x to 16x throughput increase.
I known phenomena: back in 60s it was discovered that adding more processors on top of 4 did not scale linearly e.g. if we'd expect that throughput would quadruple on 4w CPU system it would only increase by fraction on 4+ system.
Overhead of system processing + intrinsic sequential nature many typical algorithms would prevent having this from benefiting. Our modern multi-core processors can simply increase throughput if programmed accordingly
David Patterson, known for his pioneering research that led to RAID, clusters and more, is part of a team at UC Berkeley that recently made its RISC-V processor architecture an open source hardware offering. We talk with Patterson and one of his colleagues behind the effort about the opportunities they see, what new kinds of designs they hope to enable and what it means for today’s commercial processor giants such as Intel, ARM and Imagination Technologies.