REGISTER | LOGIN
Breaking News
News & Analysis

FPGAs Ride HP’s Moonshot

SRC goes to data centers with Altera
5/28/2015 11:00 AM EDT
18 comments
NO RATINGS
More Related Links
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
dimopep
User Rank
Author
A GPU substitute for High Performance Computing
dimopep   6/1/2015 12:28:59 PM
NO RATINGS
It will be very interesting to see a comparison against GPU based physical simulations like the ones performed in structural mechanics and fluid dynamics.

A nice gimmick will be to convince Altera to port their synthesis and P&R tools to run on this board. The FPGA are getting bigger and bigger, and the run times for both the front and backend tools are becoming unpleasant even with tricks like partial synthesis.

EldridgeMount
User Rank
Author
Re: C to FPGA?
EldridgeMount   6/1/2015 11:32:41 AM
NO RATINGS
@fragro - Oh, to be sure, a custom solution isn't going to be necessary all the time. In fact, with FPGAs becoming more of a mainstream option for acceleration in server racks, I suspect the percent of FPGA configuration bitstreams actually deployed on hardware which are a combination of a vendor-supplied datapath framework and custom algorithms only will be on the rise.

The level of configurability of an FPGA is such, however, that even with a fixed hardware design (i.e. a COTS FPGA board), there is still a significant amount of flexibility to what you do at the interface between the FPGA and the board-level components. With a microprocessor-based system, once you have the BSP code that configures the set of peripherals you have connected and their basic operating mode, that's kind of the end of it. On an FPGA, the possibilities are pretty wide open.

As an example, a stock framework might connect the 10G Ethernet core to a DMA-based stock peripheral, allowing packet traffic over the board's interface to be transmitted / received by a processor on another card, interacting using packet buffers and descriptors out in a shared memory space. However, I could write a custom subsystem which gets multiplexed along with the existing stock peripheral, which allows direct creation and inspection of network packets right in the FPGA fabric. The processor can still use the network port, unaware that another entity is arbitrating for access and deeply inspecting all of the received packets. This is a perfect framework for doing something like streaming bandwidth-intensive media (e.g. HD video content) across the network at near-wire-speed.

This is just an example; but even in the context of using an FPGA board for algorithm acceleration - offloading what used to be a processor-intensive task - the time and power savings realized by the acceleration core are only realized over the percent of time you are actually able to keep it fed with data, and fetch its output. In the context of an overall system design (such as a backplane with multiple FPGA and CPU cards), there may be a particularly effective way to do this for a given application - direct streaming to network or other high-bandwidth pipes (e.g. InfiniBand), or perhaps scatter / gather to one or more memories or non-volatile storage (e.g. SATA Flash drives). The more throughput you get, the better return on power savings or, for that matter, the investment in a $20,000 board in the system you are putting together to use or sell as a solution. The lowly datapath plumbing can very easily be what makes or breaks the viability of a system, and isn't always going to be well-served by a stock "Your Algorithm Goes Here" harness. And, as I have alluded to and others have mentioned - building an efficient system of this sort which meets timing, doesn't eat too much overhead of the FPGA fabric, and above all is functionally debugged, is not an easy task. And can't be done in C. ;)

fragro
User Rank
Author
Re: C to FPGA?
fragro   6/1/2015 11:02:32 AM
NO RATINGS
@ EldridgeMount: Thanks for the clarification...

interesting to see where the discrepancies between marketing and reality might be. In any case, its a fact that

- IBM today bid USD 16.7 bln for Altera

- SRC Systems claims to have been able to save roughly USD 16.5 bln per year for the leading Web/IT companies 

I guess there's more to come.

Maybe "implementing a truly custom solution" is not required everywhere. And also maybe we are watching the maturing of the technology similarly from Assembler to structured programming, after all, with a major impact on the world of computing as we have know it.

EldridgeMount
User Rank
Author
Re: C to FPGA?
EldridgeMount   6/1/2015 10:44:29 AM
NO RATINGS
They are by no means the only vendor selling tools for a design flow taking code expressed in C or another high-level language to synthesizable HDL for targeting an FPGA. Wikipedia has a decent list: http://en.wikipedia.org/wiki/C_to_HDL

Many of the implementations have special syntax to indicate parallelism, although not all do - in general, any tool which starts from a high-level model of an algorithm, whether that be C, C++, Matlab, Simulink, etc. and grinds out an HDL implementation can be something which accelerates development time and lowers risk. However, the output you get may not be something you have a lot of control over; one major advantage to FPGA fabric which ties in to parallelism is the dimension of space / time tradeoffs at your disposal. The same hardware implementing an algorithm may be multiplexed across multiple channels, if you are careful to design it in a way that lends itself to doing so. You can also run things at a higher clock rate than your datapath to increase throughput, but again need to make sure that is propertly bolted into your overall design.

However, what this does not necessarily do is replace the "plumbing" : the chip-level hardware which gets your data in and out, whether that is over interfaces like 10G Ethernet, PCIe, etc. or just to off-chip DRAM, etc. In my experience, that takes at least 50% of the time, and can make every bit as much of an impact in overall performance and / or resource & power utilization as the core algorithms. These guys may (probably do) have what amounts to a BSP for their FPGAs, a harness which can host algorithms written in C and targeted to the board. If you are selling this solution on the merits of putting FPGA-based acceleration in the hands of someone who only wants to work at the level of software algorithm implementation, it would be a requirement, really.

For implementing a truly custom solution, however, every non-trivial FPGA application is going to require banging out good old VHDL or (System)Verilog... or whatever else new comes along to model the "plumbing" at the level of abstraction of clock domains, asynchronous FIFOs, and technology-specific I/O and logic primitives - something a software language simply isn't well-suited to express. Not that that has stopped people from trying, by creating class and macro frameworks over top of C or C++ to create HDL constructs (deferred assignments, signal sensitivity lists, etc.) a la SystemC or the now-defunct Cynlib.

KarlS01
User Rank
Author
Re: C to FPGA?
KarlS01   5/30/2015 10:43:17 AM
NO RATINGS
@Rick:  It is more important to know what HW runs the output of the compiler.

"C to FPGA" can mean anything the reader imagines.  There has been C to H, C to HDL catch phrases for many years.  Generally turn out to be a "subset" that focuses on algorithmic calculation.

Then "compiler" can mean many different things to different people.  Compilers in general compile to an intermediate language that in many cases is stack based.  The run time is a RISC CPU emulating a stack machine.The last step where the IL is compiled to native is where size and speed are ignored because one can always get a faster CPU with bigger memory.

One MS comment was that they designed a soft core so that only the memory had to be loaded to change function.  The process of re-configuring the FPGA by compiling a new bit stream and loading that is too slow and too complex.

I think they also explored HW accelerators(ImpulseC) and chose a different approach.  Probably realized they wanted no part of place and route, timing closure, etc.

So what is left?  A general purpose programmable core, as MS said.  Something that can do loops, if/else, switch, and assignments fast.  They did not say as fast as possible probably to avoid generating new bit streams. 

My CEngine seemed like a pretty good fit, so I mentioned it to them and pointed out that source code can be viewed as an FSM with the line numbers are the states. "We do have FSMs in our design" was the reply from one of the designers.

Or maybe they only talk to PhD's.

rick merritt
User Rank
Author
Re: C to FPGA?
rick merritt   5/29/2015 10:04:55 PM
NO RATINGS
I suspect SRC's C to FPGA Carte compiler is its secret sauce.

The Microsoft data center team would love better tools for the Altera chips they are running in production now. The kvetched over the state of the art in FPGA programing at Hot Chips in August.

There's a pain point here waiting to write a check

KarlS01
User Rank
Author
Re: C to FPGA?
KarlS01   5/29/2015 12:12:11 PM
NO RATINGS
@kshores: FPGAs already have soft core CPUs.  I am talking about a "core" that executes C syntax w/o compiling to an ISA and doing it in very few cycles.  What it does is determined by memory content.  Also a Verilog FSM can be created from the C code if desired but then it does become a new design.  If you want to do things the hard way to maybe save whatever.


I think 1 ctcle to read control and operand addresses and 1 cycle per operator in the expression is pretty good for something that takes 3 memory blocks and 200-300 LUTS.

KarlS01
User Rank
Author
Re: C to FPGA?
KarlS01   5/29/2015 11:54:02 AM
NO RATINGS
@betajet:  If $ were the only thing.  Time critical control responses?  That is where the CPU branch time comes in.  Also the power and memory accesses/size. 

What if the FPGA is there for other reasons and off in the corner somewhere you can simply tuck in some memory and and 200 or so LUTs? (depends on ALU width and memory size.

Yes it can be scaled.

kshores
User Rank
Author
Re: C to FPGA?
kshores   5/29/2015 11:34:29 AM
NO RATINGS
  Depends upon a lot of things. ASICs have their place, when you're going to be making millions of units and power consumption is vital. The other extreme lies generic processors, RISC or otherwise. No where near as efficient, but easier to get going. Also tends to consume the most power. FPGAs are inbetween. It's a matter of using the right tool for the right job, and it's always a juggling act, never quite a perfect trade-off... (There are other, usually niche-specific considerations. Massive parallelism may be easier with FPGAs and ASICs.)

  Now the second bit. Converting C or C# to an FPGA architecture (mostly LUTs) isn't necessarily hard. Doing it efficiently is. Converting it so that is uses minimal resources is. Verifying the timing chain (without using an HDL) is. Then there's properly capturing the programmers 'intent' instead of what they actually wrote. If you write something in an HDL the same way you write software, you're going to have a bad day.

Producing an FPGA personality isn't too hard. Producing a good one that can still be debugged and verified properly is a -very- difficult problem.

 

betajet
User Rank
Author
Re: C to FPGA?
betajet   5/29/2015 11:31:49 AM
NO RATINGS
Karl wrote: FPGA FSMs change state sequentially don't they?  And they beat the RISC in speed.  They don't wait for memory all the time, just do what is to be done.

A CPU is also an FSM.  However, in each cycle it does one (or a few) ALU operations and then updates a few bits of state.  OTOH, an FPGA can do a large number of ALU operations in a single cycle, and can update a large amount of state.  The CPU time-shares one or a few ALUs.  The FPGA can have many ALUs, which can do a huge amount of computation if you can keep them busy.  "Aye, there's the rub."  In some applications (like signal processing and matrix math) you can keep them busy.  In general applications, not so much.  In that case, you save big bucks using a general-purpose CPU.

Page 1 / 2   >   >>

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Like Us on Facebook
EE Times on Twitter
EE Times Twitter Feed