Decades of computer architecture evolution have proven that doing things in parallel helps them to be completed faster and more efficiently.
For example, modern CPUs like the ARM Cortex-A15 support Single Instruction Multiple Data (SIMD) with NEON, which enables faster computation of multimedia data. Multiple CPUs can be laid out in a cache coherent system (multicore SMP) in order to provide significant performance uplift and energy savings by executing multiple threads/programs in parallel.
[Get a 10% discount on ARM TechCon 2012 conference passes by using promo code EDIT. Click here to learn about the show and register.]
Even CPUs of different characteristics may be coupled together in this arrangement and by doing so extend the flexibility and performance/efficiency points of the DVFS curve (I invite you to attend the presentation of my colleague Brian Jeff on the ARM Big.Little scheme to get more information on how this works). As a matter of fact, historically SoC designers combine diverse accelerators together on the same die sharing a unified bus matrix enabling them all to compute in parallel. Modern GPUs also support acceleration of non-graphical workloads with GPU Computing.
Parallelism is everywhere
A big challenge is that the programming approaches for each processor (CPU, GPU, ISP, DSP) differ. Optimizing code for a selected accelerator requires specialized expertise. Code written for one accelerator is typically non portable to other architectures. This can lead to a suboptimal utilization of the platform’s processing potential. Writing parallel code that scales is also very difficult, and has proven to be illusive for most applications in the mobile industry today. Modern programming framework such as Khronos OpenCL and Android Renderscript are designed to address these issues.
OpenCL provides a solution that enables easier, better, portable programming of heterogeneous parallel processing systems and unleashes the computational power of GPUs needed by emerging workloads. OpenCL creates a foundation layer for a parallel computing ecosystem and takes graphics processing power beyond graphics. It is defined by the Khronos Group, and it is a royalty-free open standard, interoperable with existing APIs.
The OpenCL framework includes:
• A framework (compiler, run-time, libraries) to enable general purpose parallel computing • OpenCL C, a computing language portable across heterogeneous processing platforms (a superset of a subset of C99, removing pointers and recursion but adding vector data types and other parallel computing features) • An API to define and control (interrogate and configure) the platform and coordinate parallel computation across processors.
The developer will identify performance-critical areas in its application and rewrite them using the OpenCL C language and API. An OpenCL C function is known as kernel. Kernels and supporting code are consolidated into programs, equivalent in principle to DLLs.