New programming tools are needed to analyze and partition applications to take advantage of the many multicore systems already shipping into the marketplace, and to harvest their available computing capacity.
Having attended several technology shows in the beginning of the year, it's clear to me that multicore is finally really happening, and it's happening fast.
Intel's Sandy Bridge started shipping in January, with four x86 cores on board. Haswell, Intel’s next architecture revision, defaults to eight cores.
At the Mobile World Congress, nVidia showed off their Kal-El mobile phone and tablet processor with a quad core inside and targeted to go into production in August this year. Qualcomm announced their new Krait architecture, which includes up to four cores also. TI announced their new 10GHz DSP, which contains 8 high-performance VLIW cores running at 1.25GHz.
Graphics architectures went from simple pixel pushing pipelines, to include vertex engines, then added programmable shaders and are now becoming general purpose multicore compute engines that are seeing rapid market adoption.
This weekend, a friend of mine showed me his new phone, the LG Optimus 2X. The “2X” label stands for dual core; multicore is even becoming a topic the marketeers get excited about and highlight as a main product feature.
Why is everyone going multicore now? Multicore is here since it solves several challenges.
Many applications can be significantly sped up through parallelization. Higher resolution and many-channel audio combined with high-def, 3D video yields a better media experience. A media player contains graphics, audio and video, each of which can be split up over multicore cores to give more performance.
Augmented reality and high quality gaming requires lots of demanding compute operations. Consumers want their browsers to load and render web pages instantly. Screen and camera resolutions are forever increasing to present and capture highest-quality imagery. Multicore is here because it addresses the challenge of satisfying this need for speed.
Designing a processor that is twice as powerful as the previous generation is no small task. Deeper pipelines, out-of-order, speculative issue, and superscalar execution all improve performance, but at diminishing returns. Ultimate performance goes up, but performance per square millimeter of Silicon actually goes down.
Not so with multicore, which is relatively easy to implement. Simply replicate the design, and add interconnect. It’s much simpler to implement a quad core processor, than to increase a single processor’s performance fourfold. Multicore is here because it solves the hardware design challenge to deliver more performance.
One way to make processors more powerful is to introduce more pipeline stages, and increase the clock rate. More pipeline stages means there’s less work done per stage. In addition, driving up the clock requires a higher voltage, resulting in a more than linear power increase, since power scales with the square of voltage.
This isn’t a very power-efficient approach, and already some time ago frequencies have stopped scaling when new process technology nodes were introduced. Using multiple cores lowers the average clock frequency, thus reducing energy consumption, even if there are more cores active at the same time. Multicore is here because it addresses the power consumption challenge.
When VLIW processors were introduced, they shifted computer architecture complexity toward the compiler. History now repeats itself with multicore architectures. The complexity is shifted away from the hardware, into the software. It’s too late for a paradigm shift in software programming to adopt new parallel programming languages though.
There’s too much legacy code, there is a lot of software tools infrastructure and learning a new language isn’t easy. One way to hide parallel complexity is behind APIs, and there are many: Pthreads, OpenCL, OpenMP, CUDA and others.
Using APIs is a good idea, but only solves part of the problem. Structuring your code to take advantage of these parallel APIs is the real challenge.
New programming tools are needed to analyze and partition the application in order to take advantage of the many multicore systems already shipping into the marketplace, and to harvest their available computing capacity. Multicore is here, and here to stay. The crux is in the programming.
Marco Jacobs is vice president of marketing at Vector Fabrics.