To create parallel multicore systems, many FPGA tools fall short because they are design assembly and implementation infrastructure, lacking in analysis. At Space Codesign, one of the ways that our SpaceStudio ESL hardware/software codesign tool can be used, is as a design creation front end for FPGA tool infrastructures like Xilinx Vivado (and likely others). We published a position paper on this topic on this site a few weeks ago ...
The key to supercomputer performance is that your architecture is optimized for an application, or family of applications. Knowing the internal details of a processor core or FPGA device (there are architecture diagrams available, after all!) but it is the system level performance that comes into play, at the end of the day.
Peter Kogge has an interesting article called Next-Generation Supercomputing (IEEE Spectrum, January 2011). In it he states that the bottleneck with next-generation supercomputing is not the speed of floating-point processors. The problem is that the power needed to transfer data to and from those processors is much higher than the power used by the processors themselves. So a conventional computer memory hierarchy with caches and main memory becomes impractical.
A possible solution? How about FPGAs as I mentioned above -- you arrange the FPGA logic implementing your problem so that each result is pumped to adjacent or at least nearby processing elements, not bothering with register files and caches. However, it's not practical to do this because of... FPGA tools, as I just described. JMO/YMMV
An FPGA-based reconfigurable computing engine has the potential to be a superb high-performance supercomputer. Unfortunately, FPGA tools are not up to the task as discussed in this 2007 article. It has to be as easy to design parallel hardware data paths as it is to write code for general-purpose CPUs, and that's not the case with current FPGA design languages and tools. FPGA tool research has always been stymied by the fact that no major FPGA manufacturer publishes their internal architecture so that the research community can develop efficient design tools for reconfigurable computing. It would be like Intel refusing to publish the X86 instruction set and requiring everyone to program in PL/M using a compiler provided by Intel. I believe this is the primary reason CPU makers sell billions and FPGA makers have stayed small. JMO/YMMV