Not everyone working on CPU architecture is with one of the big manufacturers.
In part 3, Ivan talks a little bit about the structure of the company. At this point, it is a completely self-funded "company of incorporators," a common place to be, where there are contractual agreements on how things will be divided eventually. They have been working on this for 10 years, which means that the team has changed many times over the years.
When asked how close they are to producing something, Godard offers his best estimate at roughly one year to 18 months. He elaborates that it could be two to three years before they have "saleable silicon," but again this is all an extremely rough estimate.
If you're wondering why others haven't done all this already, Godard has some thoughts for you. He explains his theories as to why others haven't explored this method, pointing out that others do in fact make changes, but those tend to be fairly small improvements on existing items.
This interview was quite enlightening. The Mill seems like something we'll be hearing a lot more about in the coming years. I'm very curious to see how much of the expected power efficiencies actually carry over from simulation to the physical product. I'll be keeping my eye on these folks for sure.
To create parallel multicore systems, many FPGA tools fall short because they are design assembly and implementation infrastructure, lacking in analysis. At Space Codesign, one of the ways that our SpaceStudio ESL hardware/software codesign tool can be used, is as a design creation front end for FPGA tool infrastructures like Xilinx Vivado (and likely others). We published a position paper on this topic on this site a few weeks ago ...
The key to supercomputer performance is that your architecture is optimized for an application, or family of applications. Knowing the internal details of a processor core or FPGA device (there are architecture diagrams available, after all!) but it is the system level performance that comes into play, at the end of the day.
Peter Kogge has an interesting article called Next-Generation Supercomputing (IEEE Spectrum, January 2011). In it he states that the bottleneck with next-generation supercomputing is not the speed of floating-point processors. The problem is that the power needed to transfer data to and from those processors is much higher than the power used by the processors themselves. So a conventional computer memory hierarchy with caches and main memory becomes impractical.
A possible solution? How about FPGAs as I mentioned above -- you arrange the FPGA logic implementing your problem so that each result is pumped to adjacent or at least nearby processing elements, not bothering with register files and caches. However, it's not practical to do this because of... FPGA tools, as I just described. JMO/YMMV
An FPGA-based reconfigurable computing engine has the potential to be a superb high-performance supercomputer. Unfortunately, FPGA tools are not up to the task as discussed in this 2007 article. It has to be as easy to design parallel hardware data paths as it is to write code for general-purpose CPUs, and that's not the case with current FPGA design languages and tools. FPGA tool research has always been stymied by the fact that no major FPGA manufacturer publishes their internal architecture so that the research community can develop efficient design tools for reconfigurable computing. It would be like Intel refusing to publish the X86 instruction set and requiring everyone to program in PL/M using a compiler provided by Intel. I believe this is the primary reason CPU makers sell billions and FPGA makers have stayed small. JMO/YMMV