Ranking is a particular kind of processing. It is uniform in that the same functions are used by all rankers. However, there are in practice hundreds of ranker models (think about all the languages, and then all the distinct modes of query - science, news, celebrity, commerce, etc) which use those functional primitives in different subsets and with different combinations. A particular model can project into the fabric a set of standard primitives it has chosen to use and then bind them in a custom data flow and weighting.
@cd2012: My comment alludes to the answer. It's because they did not implement SW as HW which is what the FPGA suppliers will do with OpenCL. For whatever reason Microsoft chose to implement many-core SOFT processors on the FPGA's, so the FPGA's are still running SW, albeit multi-threaded.
The processor is multi-core shared ALU design as opposed to conventional FPGA design. It is also multi-threaded so the speedup is due to multi-thread execution rather than traditional FPGA pipe-lined computation.
This may mean that a different design approach is needed. Technology evolution may have made the traditional approach of maximizing fmax obsolete.
What intrigued me about Microsoft's use of the FPGA's is they implemented ranking of the web search results NOT with HW implementation of SW, but with 60 SOFT processor cores on the FPGA! Since each core handles 4 threads, that's a total of 240 threads. (Though the micrograph in their slide pictures only 48 cores (8 clusters of 6 cores)
I would be curious to see the trade study between using FPGAs and using GPUs with either CUDA or OpenCL to accelerate the datacenter. As for the C++ interface, that's really nice for us old-timers, but show me a C++ programmer who is under 40-something and I'll buy you a Starbucks coffee :)
At this stage more of the engineering effort revolves around system compatibility with the data center environment (power, form factor, networking, etc.), and scheduling use of resources distributed across a network of FPGAs, than the relatively solved problem of programming individual FPGAs.