I'm happy to see that FPGAs are finding their ways into datacenters for unexpected uses (I mean compared to their traditional uses), but at the same time it was far from being the first choice "we looked at software, then GPUs and then FPGAs". And why?
Because "The FPGA tools are too slow, there are too many warnings and not great debugging -- but that's not new." Combine that with the fact that RTL is still horrible to write (and whether you use old or new HDLs does not change a lot IMO), and they had to come up with "a middle ground between C++ and RTL where people can program". Interesting, it sounds a lot like Cx.
Of course if Microsoft and Baidu had known about ngDesign, they could have used an Eclipse-based IDE featuring on-the-fly error checking and fast code generation. We have plans for a fast simulator + debugger, Microsof and Baidu feel free to contact us!
At this stage more of the engineering effort revolves around system compatibility with the data center environment (power, form factor, networking, etc.), and scheduling use of resources distributed across a network of FPGAs, than the relatively solved problem of programming individual FPGAs.
I would be curious to see the trade study between using FPGAs and using GPUs with either CUDA or OpenCL to accelerate the datacenter. As for the C++ interface, that's really nice for us old-timers, but show me a C++ programmer who is under 40-something and I'll buy you a Starbucks coffee :)
What intrigued me about Microsoft's use of the FPGA's is they implemented ranking of the web search results NOT with HW implementation of SW, but with 60 SOFT processor cores on the FPGA! Since each core handles 4 threads, that's a total of 240 threads. (Though the micrograph in their slide pictures only 48 cores (8 clusters of 6 cores)
The processor is multi-core shared ALU design as opposed to conventional FPGA design. It is also multi-threaded so the speedup is due to multi-thread execution rather than traditional FPGA pipe-lined computation.
This may mean that a different design approach is needed. Technology evolution may have made the traditional approach of maximizing fmax obsolete.
@cd2012: My comment alludes to the answer. It's because they did not implement SW as HW which is what the FPGA suppliers will do with OpenCL. For whatever reason Microsoft chose to implement many-core SOFT processors on the FPGA's, so the FPGA's are still running SW, albeit multi-threaded.