Meet the Parallella
The photograph below shows the Parallella computer lying close to its box and a dollar coin. Despite its credit-card size, this little rascal boasts a vast amount of processing resources.
The chip under the heat sink is a Xilinx Zynq device, which comprises a dual ARM Cortex-A9 processor and a 7-series FPGA on a single die. The Zynq is a game-changing device on its own, but there is more inside the Parallella.
The shiny chip, is an Adapteva Epiphany III multicore processor. Built using a 65nm process, the Epiphany III embeds 16 cores, which are able to deliver an awesome 25GFLOPs performance in single-precision format (the new Epiphany IV is built using a 28nm process, sports 64 cores, and provides an impressive 90 GFLOPs throughput).
With only 5 watts of power consumption under typical workloads, it's clear why this tiny processing beast is claimed to be the world's most energy efficient computational engine. Furthermore the Parallella is designed in such a way as to facilitate the building of processing clusters, which makes this board an optimal choice for deploying low-cost and low-power supercomputers.
Processing information is a physical process that wastes energy. The more complex the simulation you intend to run, the more energy (and time) you are going to consume.
Energy consumption in electronic systems translates into thermal issues. This is why most of the processing engines used in supercomputers require advanced active cooling, such as industrial-grade air conditioning, liquid cooling, or hardware specific heat-pipes.
Despite the fact the Parallella is extremely power efficient, the small size of the board does mandate some level of active cooling. Fortunately, extracting the unwanted thermal energy from the Parallella board requires a not very powerful fan.
Thus, in order to avoid thermal issues when running at full steam, I constructed a Lego Parallella case that includes a low-end standard PC cooling fan. The two images below show my case with the fan raised for access and lowered when running.
If you are the proud owner of a Parallella, you'll be interest to hear that Adapteva is working on offering active cooling cases for both cluster and standalone configurations (click here for more details).
All systems go!
In the photograph below we see my Parallella setup running a full-blown Ubuntu desktop based on the Linaro project. With a full HD-capable HDMI connector, USB interface, and Ethernet connectivity, the Parallella really is a fully functional, credit card-sized supercomputer for just USD$99.
On the number-crunching side, the Parallella toolchain is completely self-contained. An algorithm can be written, compiled, and executed on the platform with no need for an external host. What's more, you can easily upgrade the entire toolchain, the Operating System libraries, and even the Zynq FPGA bitstream when a new upgrade is published.
In the image above, the Parallella is simulating the evolution of an electron/anti-electron field using a QED framework. More specifically, the Parallella is calculating and plotting the relativistic evolution of a spin-1/2 fermion field inside an electromagnetic potential.
An un-optimized version of the algorithm can be run on the Zynq's dual ARM Cortex A9 processors, just as could be done on a low-end embedded computer such as the BeagleBone or the Raspberry Pi. But when the code is optimized so as to take advantage of the Epiphany III processor, an approximately 15X speed-up is achieved.
This is maybe the world's most power efficient Quantum Field Theory computation ever performed on a classical computer, but this not the important issue here. The important thing is that this experiment is just an example demonstrating how current embedded computer technology allows for a new age of affordable state-of-the-art physics simulations. We are facing a revolution in the way physics is performed and taught -- a new paradigm in which computer science is an essential part in the search for a deeper knowledge of our universe.