We did some FPGA-based research and implementation of oversampling CDR in FPGAs few years ago. The target application was a special device for a local company. We successfully used Cyclone-II as a 4-channel transceiver in this application and later implemented the same algorithms into another product based on Spartan-3 FPGA.
The algorithms were initially based on Xilinx xapp224 but we had to improve them to get reliable data recovery (at the end with very little hardware overhead). With the improvement we were able to compete with traditional PLL-based CDRs. During the research we evaluated FPGA transceivers in oversampling mode as well, see this paper: http://www.radioeng.cz/fulltexts/2010/10_01_074_078.pdf
Our conclusion is that it is efficient and reliable to use oversampling data recovery. The asynchronous (blind) oversampling data recovery is extremely efficient in multichannel (multilane) application, as you need only single clock management block (eg. PLL), which even need not to be synchronized to the data. However, the limiting factor of today FPGAs is the sampler itself - if you use general purpose IOs. Naturally, In ASIC this is not an issue.
Our evaluation of FPGA transceivers (Virtex-5 GTP) was only shallow. The oversampling feature is actually supported in the transceiver core wizard so it was very straightforward to implement it, as the data recovery engine is already part of the IP. The results were somewhat worse than we expected, which might be caused by a poor data extraction algorithm. However, we did not try to fix it by using our data recovery engine implemented within FPGA.