Design Article

IMG1

X Marks the spot...the intersection of eco- and financially-friendly computing

Geno Valente, XtremeData

11/12/2008 12:45 PM EST

FPGA companies have made a multi-billion dollar business by solving some of the toughest digital design problems in the industry. Radar, cryptography, WiMax/LTE, and software defined radio (SDR) are some of the markets where very large Field-Programmable Gate Arrays (FPGAs) have found homes in the past decade.

For many reasons, including cost pressure or stringent size, weight, and power (SWaP) requirements, these "constrained environments" have required extremely high-performance and high-efficiency architectures demanding the performance-per-watt advantages that FPGAs offer.

A key difference in electronic design between the last decade and the next will be the definition of "Constrained Environments." The applications I've mentioned so far have system and market demands that remove every last milliwatt of power, ounce of weight, or pennies of cost to deliver the optimal solution for that target market. Looking at the next ten years, let's explore if acceleration technology will help efficiently solve similar problems in other markets.

The world has already heard perhaps too much about "Green Computing," including low power and terms like "eco-friendly." This is only the beginning of a wave of requirements that will change server design for years to come. Linux Clusters are great for scaling challenges and leveraging commodity components to solve large computer problems.

However, how "GREEN" are they really? First, Linux clusters are built from x86 CPUs that typically only use some of their transistors for computation. The rest of those transistors are burning power and are not generating a solution. Clusters use tons of memory and mechanical storage that wastefully burns more power. Finally, the software running on these platforms typically can't take full advantage of multi-core CPUs, let alone tomorrow's "many-core" versions. The result is compounded inefficiency running on inefficiency.

It hit me recently in a conversation that there are more than a few markets left where efficiency hasn't been implemented and that are increasingly becoming "Constrained Environments." This intersection is where the next wave of application accelerators, such as FGPAs, will come into play.

Need a PetaFlop supercomputer? Better build a power plant!
High Performance Computing (HPC) is dominated by Linux Clusters with a few tweaks and twists to make one more special than the next. These power-hungry solutions dominate the TOP 500 because of their scalability, ease of programming, and general flexibility. What if someone built a specialized supercomputer? What benefits could be created?

What has been implemented in many embedded computing markets, such as WiMax/LTE Wireless Infrastructure for example, are highly optimized, custom electronics that leverage technology that is purpose-built for each piece of the puzzle. The market is large enough to drive innovation at every step of the engineering process.

Many companies have been driven to build FPGA intellectual property (IP) for Digital Down Conversion or Crest Factor Reduction, specialized DSPs for modem functionality, and new PowerPCs for low power, control functions, with all the necessary built-in communications interfaces.

Good versus Great
Shouldn't the HPC market be looking at architectures for purpose-built petaflop machine versus generic Linux clusters like the embedded computing world has been doing for years? Some innovation and creative thinking in this space could yield dozens of machines that are great at one thing and one thing only – the job they were built for.

Additionally, if the cost, power, and size of these specific machines were substantially less than the entire generic solution (GPP Linux clusters), then we would see a market shift. Hard imagining such a world, but when virtualization benefits come to an end, and they will, the next wave of computing dominance could be accelerators in one form or another.

Two months ago I was asked, "What could FPGAs do to solve a PetaFlop Matrix Multiply problem that was in double-precision floating-point?" Shocked by the question, I hesitantly sat down with a few experts from Altera and XtremeData. We had some real world working examples and with pen and paper we worked on solving this seemingly colossal problem.


Example FPGA-based hardware acceleration module
(Click this image to view a larger, more detailed version)

In the end we were surprised by the results. In 2010, would it take 500 racks of servers with FPGA acceleration in them? 1,000 racks? 5,000 racks? The answer was ten (10)! Only ten racks of blades with three large FPGA-based In-Socket Accelerators per-blade, one quad-core CPU per-blade, and it didn't need its own power plant either.

The solution was not only surprising, but quite green, comparatively inexpensive, and actually pretty easy to design. This is true since it used nothing but common off the shelf (COTS) components, readily available from HP, AMD, Altera, and XtremeData. The power was low, the performance was high and the time-to-market was fast. Could it really be this simple?

As researchers in industry and government continue to look at the most challenging computing problems, they realize they are becoming increasingly "constrained" – not by size or weight, but by power.

Power is the problem that will drive innovation in the next decade. In some cases the solution is simple; in others it's nearly impossible. But as technologies such as FPGA acceleration become better understood and IP becomes more available, it is increasingly feasible to rapidly build far more power-effective HPC systems.

Looking at government programs like extensions to the NSF TeraGrid, CyberRange, and other recent DARPA BAA's focusing on cyber warfare or "beyond Petaflop" computing, it's increasingly clear that the above-mentioned approach isn't as farfetched as many have thought. Database analytics
For years, the IT evolution (not revolution) has created more and more data, used more and more CPUs, and scaled datacenters until they were completely power constrained. No matter what term you use for Database Analytics [Data Marts, Data Warehouse, Decision Support Systems (DSS), etc.], these systems are increasingly becoming constrained as data storage growth continues to outpace Moore's law.

Can the IT data market learn any lessons from the Embedded Computing market now that they are in a constrained environment as well? Our belief is a resounding "YES" but only if accelerators can be abstracted from the IT user who doesn't know anything about designing them.

If an appliance vendor can provide acceleration and lower power "under the hood" so that IT pros can simply leverage the hardware/software engineering work then the benefits can be extraordinary. Lower power, far superior performance and increased scalability can all become a reality by simply finding the appliance that includes the proper mix of storage, CPU, software, and acceleration.

The appropriate solution should be a flexible, modularly-scalable system architecture that rides the silicon technology roadmap and provides a sustainable price/performance advantage over legacy solutions well into the future. One of today's trends is away from "big iron": tightly-coupled, proprietary SMP (Symmetric Multi Processing) Unix boxes towards "commodity" x86, loosely-coupled MPP (Massive Multi-Processing) Linux clusters.

Another major trend is the "burning" importance of power consumption in data centers. With the constraints of power supply and cooling severely limiting scale-up, its time to explore co-processors as CPU accelerators in markets outside of HPC.

We believe that the best computing choice available today is x86 CPUs coupled to FPGA-based accelerators. FPGAs have several key attractive features:

  • They are very power efficient - they outperform CPUs on the Performance/Watt metric by 2-3 orders of magnitude. 
     
  • They follow the CPU's semiconductor process technology very closely – typically lagging by no more than 6-12 months. So all the "Moore's Law" benefits accrue to FPGAs. 
     
  • They are in-system re-configurable. Unlike fixed architecture accelerators, FPGAs can be instantly reloaded to optimally match the application requirements.

It is generally accepted that significant scale-up can only be achieved with the loosely-coupled MPP approach. This scale-up philosophy with the insertion of "acceleration" (FPGA) in the future, we believe, will be leveraged in all next generation Data Warehouse architectures. The benefits of performance/price, performance/watt, "under the hood" acceleration, and ease of use will dominate the minds of database and appliance makers for years to come.

Bioinformatics
BLASTn, BLASTp, Smith-Waterman, and other codes are used daily by university and commercial researchers to test and find drugs for the world's worst diseases. These codes are well understood by this community and even standardized by organizations like NCBI.

Additionally, these researchers need access to data quickly which makes them constrained by performance. Lastly, this research is done from their wet lab, which is an environment that is constrained by space and power. This is a market ripe for innovation and we are all aware that, based on the profits of a new blockbuster drug, there is money to be invested on superior computing solutions if they exist.

In the past two weeks, I've read a lot about Mitrionics' Hybrid Computing Systems and Complete Genomics' service for gene sequencing for this marketplace. From what I understand, both companies have recognized a large opportunity lies in being able to solve this problem more efficiently than the generic solutions available today.

Mitrionics has taken a hardware appliance based approach to increase performance, and lower power. Complete Genomics has basically taken an on-line cluster approach that you can rent for $5,000 a sequence when you need it. Both are creative and exciting and both are helping to eliminate the generic computing problem by building specialized products and services to make drug discovery more efficient.

Conclusion
So how do you measure success of your "purpose built" solution in today's fast moving environment? "Number of Servers versus Number of Accelerated Servers" needed is how we typically evaluate solutions. This allows us to quickly derive costs for power, cooling, and floor space on a "per server" basis. Simply calculate how many servers you can replace by using acceleration than without it and go from there.

We often hear people say, "to use an accelerator it has to be 10x faster or it isn't worth it." In the near future, we feel that thinking will not only be performance-driven but also power-driven by the requirements of our now increasingly green, eco-friendly world.

What does 10X faster really mean? Holding performance constant, it means ten servers turn into one, a hundred turn into ten, or a thousand turn into hundreds. In a datacenter it means that thousands of watts go away, racks of servers go away, dozens of IT pros can be assigned to other tasks and thousands can be saved on software licensing.

In short, 10x faster can save millions of dollars in power and cooling over the life of the accelerated sever. This can no longer be ignored, especially if a datacenter is constrained or your competition is already doing it.

When you discover the intersection of "constrained environments" and "inefficient" solutions, you'll discover two kinds of green computing we're all looking for. There's the kind that's environmentally green and there's the kind that puts green in your bank account!

Geno Valente is vice president of sales and marketing for XtremeData, Inc., maker of very high-performance database Decision Support Systems (DSS) and other accelerated appliances.

Geno has spent the over 13 years helping support, sell, and market FPGA technology into markets such as Financial Services, Bioinformatics, High Performance Computing, and WiMax/LTE, while working for Altera Corporation and XtremeData Inc.


print

email

rss

Bookmark and Share

Joinpost comment




Please sign in to post comment

Navigate to related information

Product Parts Search

Enter part number or keyword
PartsSearch

FeedbackForm