RISC-V is not following the same worn path of past instruction sets, which traditionally grow in size over time, and then compilers need to figure out how to include new instructions every year or so. 80x86 has added on average on instruction per month over its 30+ year liifetime. They kind of track Moore's Law, just not that fast.
RISC-V has a well designed integer base that will never change (RVI). The optional compressed instructions (RVC) are handled by the assembler, since there is a 1-1 mapping of every 16-bit format to the equivalent 32-bit one.
As those who read the RISC-V manual can see, we recommend people target software for RVG, which is shorthand for the following optional extensions: IMAFD (Integer, Multiply, Atomic, Single Precision Fl Pt, Double Precision Fl Pt)
Many of the other optional extensions will be done in libraries (e.g., decimal floating point).
As the tables in the technical report show, there are surprisingly few instructions that need to be added when going from 32-bit addresses to 64-bit addresses to 128-bit addresses. Basically, all the registers just get wider.
RISC-V also has a very fast unimplemented instruciton trap to user mode as well as 16-bit and 32-bit jump and link instructions that can be used by a linker to replace any unimplemented instruction with a jump and link to library code that implements the missing instruction.
By having all this instruction planning laid out up front, we believe the complier issues are well in hand, and yet we can adapt to needs of the SoC by other leaving out what you don't need or adding extensions that you do need.
We need to prove this, but we thought about it carefully while designing RISC-V, and so we're aware of the implications of what we're doing.
Sorry that we did not have enough space to fully describe that particular experiment. The area numbers we pulled from ARM exclude floating point and NEON, see http://www.arm.com/products/processors/cortex-a/ cortex-a5.php. Also, the RISC-V core used had TLBs, branch prediction (BTB, BHT, and return address stack), and caches designed to match the Cortex-A5 memory hierarchy. We didn't have time to strip our core down to 32-bits to match the ARM core, so unfortunately we were handicapped with the full 64-bit virtual address width in TLB/BTB/RAS, as well as in the integer regfile.
Variants of this RISC-V "Rocket" core have been fabricated multiple times in 45nm and 28nm processes, with the resulting chips booting Linux. Some variants run well over 1GHz. Some of the variants include full 64-bit IEEE-754/2008 vector floating-point units, with well over 10 GFLOPS/W energy efficiency running actual kernels (not just peak). Some of the variants are cache-coherent multicores. Other variants run below 0.5V with extremely high energy efficiency. One of our first chip publications will appear at ESSCIRC in September if you'd like to see more concrete details.
We look forward to seeing ARM publish SPEC numbers, or other open benchmark scores, for representative versions of their cores, so we can have fair and open comparisons.
To be clear, we believe ARM is a great company that has built a very productive ecosystem for SoC designers. However, there are significant markets that don't match ARM's business model and we would like to provide an alternative.
Rocket is 1/2 the size, 1/2 the power, and 10% faster at the same GHz.
Should have we compared size and power to a larger ARM implementation?
We'd LOVE to get SPEC numbers for ARM. We asked our friends at ARM, and they said there are no such numbers available. We even asked them to reccommend a platform that we could do it ourselves, and they couldn't come up with one that would run SPEC2006.
Alas, only benchmark that runs on ARM that we can compare against is Dhrystone (!).
Hennessy and I dropped the pitfall about not running Dhrystone in the 3rd edition of Computer Archtecture: A Quantiative Approach because we thought Dhrystone was dead. Apparently, Dhrystone is the Dracula of bad benchmarks.
Until we can get our hand on something that runs full Linux on ARM, it's the best we can do, as we're anxious to show off RISC-V on real programs.
"Thanks in part to the open-source Chisel hardware design system, one 64-bit RISC-V core is half the area, half the power, and faster than a 32-bit ARM core with a similar pipeline made in the identical process."
This is hardly a valid comparison. The Cortex-A5 supports fast multiplies, DSP extensions, SIMD extensions, large TLBs and caches, branch prediction, compressed instructions, hardware Java execution, security extensions, interrupt control, multi-core etc etc. The base RISC-V ISA is more similar to a Cortex-M0 which is significantly smaller and more efficient than a Cortex-A5. But like RISC-V's basic ISA it is not suitable to run eg. Android.
It is easy to make a simple MIPS-like RISC ISA and a bare-bones CPU which appears to do well on Dhrystone. However that's hardly proof of anything. MIPS used to have various CPUs that showed that with 64-bit load/store and delayed branches you get amazing 32-bit Dhrystone scores from a simple pipeline - nice trick, just a shame that it didn't help nearly as much when running real code. Cortex-A5 actually runs Android pretty well, and I bet RISC-V with its very basic ISA won't be able to keep up.
Also this appears to be a comparison of a 5 year old widely used CPU with a simulation of an unfinished CPU. Let's compare when actual hardware is available - and instead of Dhrystone, compare with the 64-bit A53 running eg. SPEC2000.
Do you envision having variants of the architecture for different domains (like MIPS and ARM) or do you think we should move forward with a heterogeneous apporach (different ISAs/approachs for different tasks)?
btw. Love the fact that it runs on Zynq! We will give it a try on Parallella immediately.:-)
Thanks for the quick reply. I've used it as an excuse to go off and read the V2.0 ISA spec for RISC-V instead of doing the day job :-) I see that RISC-V does have multicore support in its memory model.
OpenRISC has a core instruction set, which is stable and then various extensions around that core. We have debated the number of extension sets - there are too many at present, meaning compilers need too many multilibs to use them efficiently.
As we have found with the OpenRISC GCC implementation, combinatorial explosion of multilibs may be a challenge for RISC-V. There are a 32-bit base, 64-bit base, 128-bit base and 10 standard extensions, so a compiler potentially needs 8192 multilib variants to efficiently support all possible combinations of ISA. It is possible that some extensions will have no impact on compiled code, but the number of multilibs will still be too high, so the compiler will need to restrict itself to likely popular combinations, degrading efficiency - 15-20 multilibs is a practical limit. Alternatively the user builds a compiler just for their specific architecture, but that still puts a demand on the compiler writer to be able to juggle all the possible options (consider for example how to optimally compile a * b for all C/C++ types with all possible combinations of optional ISA extensions).
BTW, OpenRISC made delayed branches optional a few years ago. All the recent implementations (e.g. Julius Baxter's mor1kx implementations) don't have delayed branches. The online version of the OpenRISC 1000 architecture spec should reflect this.
I hope RISC-V is successful, but I would rather it focussed on forward-looking innovation in ISA development, instead of industry standardization, which is innevitably backwards looking.