FPGA companies have made a multi-billion dollar business by solving some of the toughest digital design problems in the industry. Radar, cryptography, WiMax/LTE, and software defined radio (SDR) are some of the markets where very large Field-Programmable Gate Arrays (FPGAs) have found homes in the past decade.
For many reasons, including cost pressure or stringent size, weight, and power (SWaP) requirements, these "constrained environments" have required extremely high-performance and high-efficiency architectures demanding the performance-per-watt advantages that FPGAs offer.
A key difference in electronic design between the last decade and the next will be the definition of "Constrained Environments." The applications I've mentioned so far have system and market demands that remove every last milliwatt of power, ounce of weight, or pennies of cost to deliver the optimal solution for that target market. Looking at the next ten years, let's explore if acceleration technology will help efficiently solve similar problems in other markets.
The world has already heard perhaps too much about "Green Computing," including low power and terms like "eco-friendly." This is only the beginning of a wave of requirements that will change server design for years to come. Linux Clusters are great for scaling challenges and leveraging commodity components to solve large computer problems.
However, how "GREEN" are they really? First, Linux clusters are built from x86 CPUs that typically only use some of their transistors for computation. The rest of those transistors are burning power and are not generating a solution. Clusters use tons of memory and mechanical storage that wastefully burns more power. Finally, the software running on these platforms typically can't take full advantage of multi-core CPUs, let alone tomorrow's "many-core" versions. The result is compounded inefficiency running on inefficiency.
It hit me recently in a conversation that there are more than a few markets left where efficiency hasn't been implemented and that are increasingly becoming "Constrained Environments." This intersection is where the next wave of application accelerators, such as FGPAs, will come into play.
Need a PetaFlop supercomputer? Better build a power plant!
High Performance Computing (HPC) is dominated by Linux Clusters with a few tweaks and twists to make one more special than the next. These power-hungry solutions dominate the TOP 500 because of their scalability, ease of programming, and general flexibility. What if someone built a specialized supercomputer? What benefits could be created?
What has been implemented in many embedded computing markets, such as WiMax/LTE Wireless Infrastructure for example, are highly optimized, custom electronics that leverage technology that is purpose-built for each piece of the puzzle. The market is large enough to drive innovation at every step of the engineering process.
Many companies have been driven to build FPGA intellectual property (IP) for Digital Down Conversion or Crest Factor Reduction, specialized DSPs for modem functionality, and new PowerPCs for low power, control functions, with all the necessary built-in communications interfaces.
Good versus Great
Shouldn't the HPC market be looking at architectures for purpose-built petaflop machine versus generic Linux clusters like the embedded computing world has been doing for years? Some innovation and creative thinking in this space could yield dozens of machines that are great at one thing and one thing only – the job they were built for.
Additionally, if the cost, power, and size of these specific machines were substantially less than the entire generic solution (GPP Linux clusters), then we would see a market shift. Hard imagining such a world, but when virtualization benefits come to an end, and they will, the next wave of computing dominance could be accelerators in one form or another.
Two months ago I was asked, "What could FPGAs do to solve a PetaFlop Matrix Multiply problem that was in double-precision floating-point?" Shocked by the question, I hesitantly sat down with a few experts from Altera and XtremeData. We had some real world working examples and with pen and paper we worked on solving this seemingly colossal problem.
Example FPGA-based hardware acceleration module
(Click this image to view a larger, more detailed version)
In the end we were surprised by the results. In 2010, would it take 500 racks of servers with FPGA acceleration in them? 1,000 racks? 5,000 racks? The answer was ten (10)! Only ten racks of blades with three large FPGA-based In-Socket Accelerators per-blade, one quad-core CPU per-blade, and it didn't need its own power plant either.
The solution was not only surprising, but quite green, comparatively inexpensive, and actually pretty easy to design. This is true since it used nothing but common off the shelf (COTS) components, readily available from HP, AMD, Altera, and XtremeData. The power was low, the performance was high and the time-to-market was fast. Could it really be this simple?
As researchers in industry and government continue to look at the most challenging computing problems, they realize they are becoming increasingly "constrained" – not by size or weight, but by power.
Power is the problem that will drive innovation in the next decade. In some cases the solution is simple; in others it's nearly impossible. But as technologies such as FPGA acceleration become better understood and IP becomes more available, it is increasingly feasible to rapidly build far more power-effective HPC systems.
Looking at government programs like extensions to the NSF TeraGrid, CyberRange, and other recent DARPA BAA's focusing on cyber warfare or "beyond Petaflop" computing, it's increasingly clear that the above-mentioned approach isn't as farfetched as many have thought.