I'm not sold on the idea that we have to start from scratch to get an industry standard open ISA. As the article notes, we already have OpenRISC, which comes with an open bus standard (WishBone). The architecture is based on something well proven (DLX, of which David Patterson was half the design team), for which there is plenty of tutorial material.
It takes a long time to build all the software infrastructure around a new ISA. Surely far better to start with something like OpenRISC that has spent 15 years invested in its software.
The OpenRISC architecture doesn't tick all the author's boxes, but it is extensible, and the missing features (do we really need 128-bit addressing) could be added.
One feature that is not mentioned is multiprocessor support. Thanks to the work of Stefan Wallentowitz at TU Munich and others, this is something that OpenRISC now supports. It seems to me this ought to be a key feature of any new ISA.
The authors are two engineers for whom I have the greatest respect. I wish RISC-V well, because this team is certain to innovate, and that can only be good for the field. But as the basis of an industry standard open ISA? I wish they had built on what was already there, rather than starting again from scratch.
We started in 2010, when OpenCore only had a 32-bit address space, which was a fatal flaw that was later corrected. It is still missing the small code size option, which is requirement for IoT.
And I am not sure if everyone understands the importance of the "Base+Extension" approach to instruction sets. This is a new approach to coping with software compatability of instruction sets. As we wrote in the associated technical report:
"RISC-V is aimed at SoCs, with a base that should never change given the longevity of the basic RISC ideas; a standard set of optional extensions that will evolve slowly; and unique instructions per SoC that never need to be reused."
Software compatability with controlled evolution.
(And it's really hard in 2014 to embrace an ISA that offers delayed branches:)
Thanks for the quick reply. I've used it as an excuse to go off and read the V2.0 ISA spec for RISC-V instead of doing the day job :-) I see that RISC-V does have multicore support in its memory model.
OpenRISC has a core instruction set, which is stable and then various extensions around that core. We have debated the number of extension sets - there are too many at present, meaning compilers need too many multilibs to use them efficiently.
As we have found with the OpenRISC GCC implementation, combinatorial explosion of multilibs may be a challenge for RISC-V. There are a 32-bit base, 64-bit base, 128-bit base and 10 standard extensions, so a compiler potentially needs 8192 multilib variants to efficiently support all possible combinations of ISA. It is possible that some extensions will have no impact on compiled code, but the number of multilibs will still be too high, so the compiler will need to restrict itself to likely popular combinations, degrading efficiency - 15-20 multilibs is a practical limit. Alternatively the user builds a compiler just for their specific architecture, but that still puts a demand on the compiler writer to be able to juggle all the possible options (consider for example how to optimally compile a * b for all C/C++ types with all possible combinations of optional ISA extensions).
BTW, OpenRISC made delayed branches optional a few years ago. All the recent implementations (e.g. Julius Baxter's mor1kx implementations) don't have delayed branches. The online version of the OpenRISC 1000 architecture spec should reflect this.
I hope RISC-V is successful, but I would rather it focussed on forward-looking innovation in ISA development, instead of industry standardization, which is innevitably backwards looking.
The big OpenRISC enhancements in the last few years were the new implementations without branch delays, with much shortened pipelines, and the improvements to multicore support. The community is very active at present, so it can be hard to keep track of all the changes (see the #openrisc IRC channel on freenode.net for the ongoing conversations).
It's always hard to know with open processors who is using them - the community tends to hear after the event. OpenRISC is in some Samsung set top box chips, and in NXP/Jennic Zigbee chips (in the BA Semi variant). OpenRISC is used as a power controller in the AllWinner A1000, part of their A31 ARM based SoC. OpenRISC flew in NASA's TechEdSat a year or two ago (declaration of interest, we did a commercially robust version of the GCC C compiler for that project).
There is an ongoing project at the University of Genoa and ETH Zurich, supported by ST, which is developing a low energy multicore SoC based on OpenRISC.
I'd be interested if other readers had heard of new commercial uses.
The OpenRISC community has for a year or two considered specifying a future architecture, known as OpenRISC 2000. Perhaps we should be looking to RISC-V as the basis of that architecture? There would be a certain intellectual consistency, given OpenRISC 1000 was based on DLX. Doubtless this will be discussed at ORConf 2014 in October.
RISC-V is not following the same worn path of past instruction sets, which traditionally grow in size over time, and then compilers need to figure out how to include new instructions every year or so. 80x86 has added on average on instruction per month over its 30+ year liifetime. They kind of track Moore's Law, just not that fast.
RISC-V has a well designed integer base that will never change (RVI). The optional compressed instructions (RVC) are handled by the assembler, since there is a 1-1 mapping of every 16-bit format to the equivalent 32-bit one.
As those who read the RISC-V manual can see, we recommend people target software for RVG, which is shorthand for the following optional extensions: IMAFD (Integer, Multiply, Atomic, Single Precision Fl Pt, Double Precision Fl Pt)
Many of the other optional extensions will be done in libraries (e.g., decimal floating point).
As the tables in the technical report show, there are surprisingly few instructions that need to be added when going from 32-bit addresses to 64-bit addresses to 128-bit addresses. Basically, all the registers just get wider.
RISC-V also has a very fast unimplemented instruciton trap to user mode as well as 16-bit and 32-bit jump and link instructions that can be used by a linker to replace any unimplemented instruction with a jump and link to library code that implements the missing instruction.
By having all this instruction planning laid out up front, we believe the complier issues are well in hand, and yet we can adapt to needs of the SoC by other leaving out what you don't need or adding extensions that you do need.
We need to prove this, but we thought about it carefully while designing RISC-V, and so we're aware of the implications of what we're doing.
"I hope RISC-V is successful, but I would rather it focussed on forward-looking innovation in ISA development, instead of industry standardization, which is innevitably backwards looking."
Thanks for reading the spec. One of the primary reasons we designed RISC-V was to support achitecture research (the other was to support education). Only later did we realize what we'd built might make a good industry-wide standard. We believe the way we provide sufficient opcode space and easy-to-parse variable-length instructions, along with our conventions on ISA extensions, will support a very rich set of new instructions, while never burying the very solid core.
Do you envision having variants of the architecture for different domains (like MIPS and ARM) or do you think we should move forward with a heterogeneous apporach (different ISAs/approachs for different tasks)?
btw. Love the fact that it runs on Zynq! We will give it a try on Parallella immediately.:-)
As the technical report says, we intend RISC-V to be used for everything from as small as Internet of Things (IoT) to as large as Cloud Computing (Warehouse Scale Computers or WSCs).
As some of other posts hint, RISC-V is a very modular ISA, while still a reasonable target for compilers.
For IoT, you'd want to use 32-bit address, integer instructions (I), and compressed instrucitons (C). We call that RV32IC
For WSC, you'd want to use at 64-bit addresses (and over the next decade maybe even 128-bit addresses), integer instructions (I), multiply-divide (M), atomic instructions (A), single (F), double (D), and even quadruple (128-bit or Q) floating point instructions: We call that combination RV64IMAFDQ
"Thanks in part to the open-source Chisel hardware design system, one 64-bit RISC-V core is half the area, half the power, and faster than a 32-bit ARM core with a similar pipeline made in the identical process."
This is hardly a valid comparison. The Cortex-A5 supports fast multiplies, DSP extensions, SIMD extensions, large TLBs and caches, branch prediction, compressed instructions, hardware Java execution, security extensions, interrupt control, multi-core etc etc. The base RISC-V ISA is more similar to a Cortex-M0 which is significantly smaller and more efficient than a Cortex-A5. But like RISC-V's basic ISA it is not suitable to run eg. Android.
It is easy to make a simple MIPS-like RISC ISA and a bare-bones CPU which appears to do well on Dhrystone. However that's hardly proof of anything. MIPS used to have various CPUs that showed that with 64-bit load/store and delayed branches you get amazing 32-bit Dhrystone scores from a simple pipeline - nice trick, just a shame that it didn't help nearly as much when running real code. Cortex-A5 actually runs Android pretty well, and I bet RISC-V with its very basic ISA won't be able to keep up.
Also this appears to be a comparison of a 5 year old widely used CPU with a simulation of an unfinished CPU. Let's compare when actual hardware is available - and instead of Dhrystone, compare with the 64-bit A53 running eg. SPEC2000.
Rocket is 1/2 the size, 1/2 the power, and 10% faster at the same GHz.
Should have we compared size and power to a larger ARM implementation?
We'd LOVE to get SPEC numbers for ARM. We asked our friends at ARM, and they said there are no such numbers available. We even asked them to reccommend a platform that we could do it ourselves, and they couldn't come up with one that would run SPEC2006.
Alas, only benchmark that runs on ARM that we can compare against is Dhrystone (!).
Hennessy and I dropped the pitfall about not running Dhrystone in the 3rd edition of Computer Archtecture: A Quantiative Approach because we thought Dhrystone was dead. Apparently, Dhrystone is the Dracula of bad benchmarks.
Until we can get our hand on something that runs full Linux on ARM, it's the best we can do, as we're anxious to show off RISC-V on real programs.
Sure it may run Linux, but the Rocket variant that was compared with the A5 doesn't appear to have an FPU or any other extension beyond the very basic 64-bit ISA. So that variant most certainly can't run SPEC well or do anything that you expect in a modern CPU (multi-core, debug, performance counters, timers, interrupt controllers and so on). Cortex-A5, while one of the smallest ARMv7-A cores, is far more advanced, significantly faster and has good code density (unlike RISC-V).
So if you wanted to do a barebones CPU comparison you should compare with an M0 or M3 with an added MMU. That would be a reasonable comparison as they have very similar features. Performance will be close on 32-bit code, but M0/M3 obviously wins big time on power, code density and area.
On the other hand, if you wanted to compare a full SPEC capable Rocket version, you'd have to give the size/power for the full version including all the extensions, FPU, MMU, caches, etc. No bait and switch like giving area/power results for a minimal version while quoting performance of the most advanced version!
The thing is, when you aim for a similar level of performance as modern ARM CPUs then you'll need a real memory system. Not a 1-way 8KB cache like the very first Alpha used more than 2 decades ago!!! That means a much larger die size and higher power.
There are SPEC scores available for various ARM cores, however you can buy pretty much any device nowadays and just run benchmarks yourself. You can use the NDK on Android devices (or Linux if you root it), but I find a Chromebook is a perfect ARM Linux development machine (lots of people run SPEC on it).
Anyway, if you actually ran the full SPEC2006 on Rocket, just publish the scores. I don't see why you have to run it first on ARM.
Dhrystone is a zombie benchmark indeed, partly because it is easy to use, gives a quick&dirty estimate of core-only performance, and all benchmarks that tried to replace it turned out to be far worse. So it won't die any time soon...
Here is an idea: to give people an idea what the ISA is like, why not publish the Dhrystone disassembly? I bet not everybody will have a spare week for downloading, building and troubleshooting the RISC-V tools...
Not sure it is possible to argue "goodness" based on reading the spec. Things don't get much better with more quantitative analysis like kernel benchmarks. Silicon area, frequency, # issues are first order effects, other factors less so imho. Personally I certainly wouldn't argue too hard with Professor Patterson design decisions:-) I learned pretty much everthying I know about computer architecture from reading his books.
It's an interesting discussion for sure, but let's not forget that the REALLY big news is that fact that the world now has a first rate free to use openly available ISA for all to use.
Who knows...maybe this will be the Linux equivalent moment for chip design...
Andreas, if you have the right ISA experience then yes, one can gather a lot from just a spec. And the spec basically shows a cut-down MIPS with a few Alpha features. Neither is well known for their great code density or IPC due to the simple instructions. So nothing revolutionairy - just a simple evolution of the original MIPS with various mistakes removed (and IMHO a few new mistakes added). It's interesting but try to compare it against ARM64 for example (next year we'll likely see a big shift towards 64-bit in the mobile space).
I don't think this is the first free/open ISA and I don't understand the advantage of being so. For software it makes sense as there are literally millions of developers who benefit. For ISAs/CPUs there are may be a hundred or so companies with the resources to make a competitive SoC, and most already license existing "closed" ISAs and CPUs.
I think you are underestimating the value of collaboration. As smart and talented as my friends at ARM are, they are no match for the combined intelligence and engineering experience of the whole industry combined.
Creating a good ISA and base set of tools is non-trivial, but the really hard part is making it stick. There were already tones of open source processors (>100?) on opencores, all of them open source.
In my opinion, the RISC-V is the open source processor to date with the best chance to make a broad impact long term.
Getting an open source project to critical mass and keeping it from crashing is a very difficult task, but when it's done right it benefits the whole world (see Linux and the Eclipse foundation as good examples).
Andreas, I do appreciate the benefit of collaboration, but having first-hand experience with the open source community I know that it is no panacea. It works at a much slower pace than a commercial closed environment and involves a lot of compromises to get anywhere near where you'd like to get.
You're quite right, like with any project (open or not), the most difficult part is to build critical mass. I don't have an opinion on whether that will happen for RISC-V - time will tell whether open ISAs/cores are a fashion fad or here to stay, but given the fortunes of its predecessor MIPS, I suspect it is not going to be easy.
"Sure it may run Linux, but the Rocket variant that was compared with the A5 doesn't appear to have an FPU or any other extension beyond the very basic 64-bit ISA. So that variant most certainly can't run SPEC well or do anything that you expect in a modern CPU (multi-core, debug, performance counters, timers, interrupt controllers and so on). Cortex-A5, while one of the smallest ARMv7-A cores, is far more advanced, significantly faster and has good code density (unlike RISC-V)."
If you had read our response, you would see that the A5 compared doesn't have FPU or NEON either.
Lacking your ability to project SPEC scores from Dhrystone results, we are hard at work porting SPEC codes to both ARM and RISC-V to obtain fairer comparisons. But mostly we're porting SPEC in the interests of improving our cores' performance rather than marketing. ARM and RISC-V have very different business models, but it's very educational to compare performance metrics.
"If you had read our response, you would see that the A5 compared doesn't have FPU or NEON either."
That's not the issue at all. First of all what exactly is Rocket? I assume it just implements the base 64-bit RISC-V ISA and uses 1-way 8KB L1 caches. Now if we agree on that then that means that:
1. Any comparison with Cortex-A5 is fundamentally flawed (irrespectively of the FPU) 2. It won't run anything other than Dhrystone well, if at all (even CoreMark does multiplication for example, many integer codes in SPEC use floating point, and any code that uses serious memory bandwidth will be slow)
Yes it would be very educational to get some SPEC scores. It'll show that designing an all-round CPU is hard. It took ARM quite a few generations to get decent performance.
[ I'm having trouble posting the whole response in one go, so will try posting as a sequence of messages. Please read the sequence before responding and reply to last one in sequence. I'll mark it as the last one, because I don't know a priori how many I'll need. ]
We pulled the Dhrystone comparison together quickly, as we kept getting asked about how we compared to ARM cores and these were the only publicly available numbers we could easily compare against. We didn't spent a lot of time on it, as we're not particularly interested in "Dhrystone Drag Racing" with minimal stripped-down cores. Basically, we sized the caches to match ARM's configuration and just removed the vector floating-point unit we usually add to be a fairer comparison with the ARM which also doesn't have an FPU or vector unit (you are incorrect, these are optional in ARM A5). We didn't strip out a lot of other stuff that we could have. Specifically:
The Rocket core implements RV64IMA, i.e., base integer, integer multiply/divide, and atomic operations (which are quite extensive in RISC-V and go unused in Dhrystone). Our registers are twice as wide (64 vs 32) and we have twice as many user registers (32 versus 16) as ARM. The 64-bit width does help Dhrystone, but also lots of other code, and they are obviously included in our area number. The instruction cache was 16KB 2-way set-associative, 64-byte lines, and blocking on misses.
Data cache was also 16KB 2-way set-associative, 64-byte lines but because it was designed to work with our high-performance vector unit, it's non-blocking with 2 MSHRs, 16 replay-queue entries, and 17 store-data queue entries for D$. Obviously, none of these help Dhrystone, which never misses in the caches.
When we compared numbers with and without caches, we weren't sure what ARM left out, so we only removed the SRAM tag and data arrays and left in all of the above cache control logic in our core area. The MMU has 8 ITLB and 8 DTLB entries, fully associative, and the MMU has a hardware page-table walker. Obviously, the hardware page table walker doesn't help Dhrystone.
The branch prediction hardware is a BTB with 64 entries, a BHT with 128 entries, and a RAS with 2 entries. This amount of branch prediction helps Dhyrstone, but would help a lot of other codes too.
As I said, we don't spend our lives worryng about Dhrystone, so followed the following document giving guidelines from our friends at ARM: DAI0273A_dhrystone_benchmarking.pdf when compiling the code.
Our standard C library does include hand-optimized assembly, and does make use of all 64-bits (of course!), but we also did this for functions not used by Dhrystone also as a standard library helps all code. We'll be posting our disassembled Dhrystone on the website shortly (bit big for a blog post).
Same as ARM, we didn't actually fabricate this version but we have fabricated and measured enough variants in different processes to be confident in our layout results.
So, overall, we're pretty sure it's a reasonable comparison, though we're not completely sure about all the details in ARM's result to make sure we're being fair.
The appropriate initial reaction to your skepticism about our Dhrystone numbers would have been to ask us for more details so you would actually have some facts on which to base your opinion. Instead, in your very first post, you incorrectly assumed the worst possible behavior on our part, and incorrectly glorified what ARM had included in the core they measured. You might want to reconsider posting pejorative assertions that you have no way of knowing are correct. We're not above accepting apologies, if you're not above admitting your mistakes. ***
Sorry that we did not have enough space to fully describe that particular experiment. The area numbers we pulled from ARM exclude floating point and NEON, see http://www.arm.com/products/processors/cortex-a/ cortex-a5.php. Also, the RISC-V core used had TLBs, branch prediction (BTB, BHT, and return address stack), and caches designed to match the Cortex-A5 memory hierarchy. We didn't have time to strip our core down to 32-bits to match the ARM core, so unfortunately we were handicapped with the full 64-bit virtual address width in TLB/BTB/RAS, as well as in the integer regfile.
Variants of this RISC-V "Rocket" core have been fabricated multiple times in 45nm and 28nm processes, with the resulting chips booting Linux. Some variants run well over 1GHz. Some of the variants include full 64-bit IEEE-754/2008 vector floating-point units, with well over 10 GFLOPS/W energy efficiency running actual kernels (not just peak). Some of the variants are cache-coherent multicores. Other variants run below 0.5V with extremely high energy efficiency. One of our first chip publications will appear at ESSCIRC in September if you'd like to see more concrete details.
We look forward to seeing ARM publish SPEC numbers, or other open benchmark scores, for representative versions of their cores, so we can have fair and open comparisons.
To be clear, we believe ARM is a great company that has built a very productive ecosystem for SoC designers. However, there are significant markets that don't match ARM's business model and we would like to provide an alternative.
"Cortex-A5 actually runs Android pretty well, and I bet RISC-V with its very basic ISA won't be able to keep up"
I'm curious. What instructions do you imagine we're missing in RISC-V G that would make Android run better? Nearly all compiled integer code I see is dominated by the very simplest instructions and it is extremely hard to add an instruction that really improves performance across all applications.
Well, to be honest I don't know where to start - a LOT of useful instructions are missing in RISC-V. Eg. load/store indexing, load/store of 2 or more registers, shift+add, conditional move/select, conditional compares, branch/call instructions with a larger range, immediate instructions that allow efficient 2-instruction sequences, rotate, bitfield operations, multiply accumulate (and yet there is REM???), add-carry etc.
Then there is DSP, SIMD, CLZ, etc, and although these are less frequently used, they do provide significant speedups. It's important to understand there is never a need for every instruction to improve performance across all applications - if you go down that path you end up with something like Alpha, ie no byte or halfword loads/stores as 80% of applications hardly need them! All you need is to show the benefit of an instruction outweighs its cost.
Note also the codesize impact is an important consideration - I proposed several instructions in Thumb-2 solely for the benefit of improving code density (ultimately that means lower power and improved performance). Given RISC-V will have pretty bad code density due to all the missing instructions, adding a compressed form of the ISA should be a priority.
We are in contact with some other OS developers, but are focused on finishing the privileged architecture specification and system binary interface specification to reduce the effort of porting OSs to different RISC-V implementations.
V8 was the open standard with IEEE. Although Sun, to their credit, released a few open-source designs of 64-bit SPARC V9 cores, I don't believe the ISA was officially freed, and now Oracle owns the architecture..
There are possibly a small set of pipeline designs for which delayed branches might make some sense as a simple way to reduce some control hazards. But there is a far larger universe of pipeline designs where they only hinder performance. This is not a controversial view point amongst architects. Given that even a small investment in branch prediction hardware has high rewards, I'm not even sure there are any processor implementation budgets for which delayed branches in ISA make sense on general-purpose code.
"Thanks in part to the open-source Chisel hardware design system, one 64-bit RISC-V core is half the area, half the power, and faster than a 32-bit ARM core"
How exactly does Chisel enable that? I had a look at it, and it seems to be a Scala-driven HDL generator. It probably makes design easier and faster, but the area/power should be the same as when using HDL directly.
Chisel helps precisely because it's easier to change the core around, and to build parameterized cores in the first place. It's difficult to predict how things will turn out once pushed all the way to layout, so the design iterations involve a lot of rewriting of RTL or searching over design parameters.
1. I've done similar code density comparisons before and MIPS always came out significantly larger than ARM (in fact even MIPS-16 was usually a little larger than ARM!). Also I typically see x86 and ARM being similar in size. So these results seem a bit odd, and I wonder whether they have been done correctly.
2. I disagree. One core has all the modern features you'd expect in a CPU including virtualization, DSP, SIMD, multipliers, multi-core support, the other has none of those, so that clearly skews the comparison. We also don't exactly know the size of the branch predictor, TLB entries etc, so these are not comparable either. Given that all this affects performance, area as well as power it is important to do a like with like comparison.
For Linux, Android and SPEC you actually need large advanced branch predictors, a good memory pipeline, a large and highly associative L1, big TLBs, big low latency L2, prefetchers, fast memory controller etc. You may have a great integer pipeline but if you don't have all of those features your performance is going to suck (unless you only ever run Dhrystone or CoreMark).
3. CoreMark is a bad benchmark, I class it as worse than Dhrystone. At best you can call it an indirect branch predictor torture test. I recently heard that even the Pro version of CoreMark is horribly broken as it appears to spend >75% of its time in strcat...
When we are talking about a single issue in-order pipeline executing mostly single-cycle instructions then the only thing that matters for performance is reducing the number of instructions executed. So clearly any 2+ instruction sequence that can be combined into single-cycle instructions will improve performance.
On the other hand, if you consider a 3/4 way out of order CPU then I'd agree that a simpler ISA works fine there as any extra instructions can often be executed by the OoO machinery without penalty (you still have to worry about latency, eg. not having LDR R0, [R1, R2, LSL #2] costs you a few extra cycles latency to calculate the address plus AGU stalls).
I do like to minimalize ISAs however this has gone too far. On cheap MCUs you have fast multipliers and dividers, DSP extensions, and nowadays even floating point! Today it is no longer about minimizing transistor count of just the CPU, it's the whole SoC you need to consider. If not having certain instructions means you need extra library code (eg. division or floating point emulation), and that also has an area, power and performance cost.
Anyway don't take my word for it. Just try selling a SoC without division or multiply to people used to single-cycle multiplies and 10 cycle divisions. Good luck!
1. Yes my experience is different so I question the results. Without more details it's hard to believe the numbers are right. Of course this guy had all the reasons to show RISC-V in the best possible light...
2. I'm well aware of the Cortex-A5 features, which is why I keep reminding you that this core is far more advanced than you believe. Again, it has multiplies, DSP instructions, SIMD instructions, performance counters, debug, virtualization, interrupt controllers, support for ARM and compressed Thumb-2, support for multiple cores, load/store exclusive etc etc etc in the base configuration. This is not something you can remove as all of this is part of ARMv7-A. None of the features I mentioned above are in Rocket. So no, these cores are nothing alike, and claiming they are is simply being dishonest.
As for performance, so far we haven't seen any evidence that Rocket is faster on real benchmarks. Even the Dhrystone score is questionable without more details (there are a few tricks you can pull to get a good score in 64-bits). But... are you really suggesting that Rocket will win on SPEC with its 8KB 1-way set associative caches??? I'd really love be in the meeting where you try to pitch that to a potential customer :-)
3. I have personally benchmarked CoreMark extensively on various CPUs and no it does not stress the caches at all. It does stress the branch predictor, especially the indirect predictor, but that's pretty much it. And there are a few compiler tricks that defeat the benchmark and make it run 40-50% faster. Some may well recommend it (presumably after they broke the benchmark), but good luck comparing CPU performance based on that!
4. Surely the ISAs matter, that's the whole point of this conversation. For example this is how you do an array access on ARM:
LDR R0, [R1, R2, LSL #2]
And this is what you do on RISC-V:
sll x3, x2, 2 add x3, x3, x1 lw x3, [x3, 0]
This is an important difference due to the load instructions in the ISAs. In the second case we not only have more instructions which require serialized execution, we may also get an ALU->AGU stall depending on the details of the pipeline.
Yes, benchmark results would be great, but we have not seen any definitive benchmark results on similar microarchitectures.
"As to what sells in the market, I think I have a pretty good idea ! Please see our website"
I like the enthousiasm, but it would be nice to see some actual SoCs that are competitive with high-end ARM ones. Then people might take it seriously. I think you have no idea how much work goes into designing a real commercial CPU. Intel has been pumping many billions of dollars into their Atom line over the last 6 years, trying to make it competitive with ARM cores - and despite their huge advantage in process technology it has been without any success.
I have a few complaints (as an outsider interested in computer architecture) about RISC-V and the RISC-V project. It looks like only recently has a mailing list been promoted on the riscv.org site (the archive for the Hardware Developers list indicates the first message was 7 Aug 2014) and as I recall (my memory is far from perfect) the contact information was not clearly presented.
(I had emailed Andrew Waterman quite some time ago about RISC-V Compressed—after reading his Masters thesis—and other aspects of RISC-V. I did not receive a response. This is somewhat understandable as it was a long email from a nobody naturally leading to a longer delay to provide a decent answer and delayed low priority tasks are often forgotten. I myself have delayed responses so long that I eventually decided not to respond, so I cannot justly complain. However, if I had known about a mailing list or forum, I could have posted comment there.)
This is not a problem unique to RISC-V. Even though the OpenRISC project had a mailing list/forum, I found myself losing all interest in posting there as posts on architectural and microarchitectural ideas received little interest. I have some talent in computer architecture, and i would have liked to have contributed something of real value, beyond some edu-tainment from Internet posts. (Andy Glew, 13 Aug 2003: "You have the sort of obsessive attention to tradeoffs in computer architecture that is typical of some of us computer architects. ... But you have some talent. / If this was an Open Source project, I'd try to drag you in. / For a company, yeah, your resume scares me. / But I still see promise in your posts.")
(It is still not clear where Architectural thoughts should be posted. Microarchitectural thoughts should probably go to the Hardware Developers list, but Architectural thoughts might not be appropriate for Software Developers or Hardware Developers.)
Another complaint I have about RISC-V is that version 2.0 was finalized before the compressed format was established and the placement of instruction fields do not account for 16-bit instructions. (I am also more inclined to marker bits than a contiguous field for size indication, at least for 16-/32-bit; I am guessing that such would be slightly more decode friendly for wide superscalar with 2 instruction sizes.) Since microcontroller implementations would benefit most from compression and have the tightest size and energy contraints, optimizing field placement for the compressed extension seems desirable.
(The ABI will also have a significant impact on the encoding. The choice of R0 as a zero register also seems to work against simple RVC decode and works against 16-GPR variants if the ABI uses lower-numbered GPRs for arguments. [I think providing 16-GPR options could be useful for increasing thread count, vaguely similar to the flexible allocation of registers to threads in a GPU.])
I have a few other similarly minor complaints about the ISA (e.g., it looks like there is no canonical register clearing instruction—such can be used for renamer "zeroing elimination"—using ADDI would have the advantage of being already special, being used for NOP, and allowing potential small immediate setting instructions to use "register inlining" in the renamer). Many of my thoughts are too late given the finalization of the relevant portions of the ISA, but some would still probably have some value.
I really appreciated the inclusion of rationales in the specification (even when I disagree). For some instructions it may also be useful to provide examples of use, and I think greater similarity to existing manuals (rather than the less formally structured current presentation) might be good. I think that having a wiki where such rationales could be expanded upon, code examples could be provided, and microarchitectural tricks could be described (with few length constraints) could be useful. Of course, a wiki could also serve as a teaching resource.
I do not intend to be excessively negative. I realize that even graduate student time is not an unlimited resource ☺ and that ISA design and project management choices must be made in a more complex context than a outside "computer architecture hobbyist" would be aware of (and that commitment to a specification must be made before everything is perfect—or even practically perfect in every way).
I like the idea of a free/open standardized ISA. Unfortunately I'm don't have enough time right now to sit down and read through the RISC-V documentation, but overall it sounds like a solid ISA. My main critisism of this however is that it has been done more or less behind closed doors, which unfortunately is far too common in academia, and therefore seems to suffer a lot from the NIH syndrome. I'm for example interested in how much time was spent to create and implement Chisel, and why none of the existing HDL generator languages were used. You say you started in 2010 and yet we only hear about this now.
As several others have already pointed out, the big task is not defining the ISA, but making it stick. Out of the hundreds of open source CPU designs available, only a few have got any momentum. On the OpenRISC side we have several OS ports, both GCC and LLVM, several libc implementations, boot loaders and I I'm inclined to believe that there have been much more effort spent on the software support than hardware. A big risk with a new arch is that the users has lost interest before the required tools are implemented. We have noticed this for OpenRISC in the past that the lack of clear documentation and an easy way to get started cost us several potential contributors. lm32 is a good example of an arch that suffers from the lack of a large software collection rather than any technical deficiencies
"My main critisism of this however is that it has been done more or less behind closed doors, which unfortunately is far too common in academia, and therefore seems to suffer a lot from the NIH syndrome...You say you started in 2010 and yet we only hear about this now."
The RISC-V user spec v1.0 was published in 2011. It has since been cited at least 15 times, and we have presented work using RISC-V in a number of conferences and workshops. We also have at least 3 different external groups from around the world using RISC-V. They found out about it on their own and decided it was the superior option for their needs. Also, you should read the spec; it's a very enjoyable read (imo) and it lays out both the history of RISC-V (i.e., its gentle evolution over 20 years) and a list of external contributors who have provided invaluable feedback on the ISA.
You can learn more about chisel at chisel.eecs.berkeley.edu. We got sick of HDLs getting in our way of building processors, and we felt that all existing solutions were inferior to creating our own. The Chisel DAC 2012 paper's intro explains this in more detail.
Regarding OpenRISC, I'm not in any position to make a claim on which ISA a new user in 2014 should choose. What I can say was that, in 2010, OpenRISC was not a viable solution for us to use in our research. We needed 64b, we needed a LOT more opcode space for research extensions, no delayed branches, and we needed 2008-revised IEEE 754 FP. That meant writing a new ISA.
Thank you for the clarifications. I realize that I made some claims without looking at the history of RISC-V. My apologies for that, and I will make sure to read more about this when I get the time. As Jeremy has already pointed out, RISC-V would make a lot of sense as an OpenRISC 1000 successor, especially given that our team is too small to develop and maintain a new ISA and the existing OpenRISC at the same time.
Regarding Chisel, we're all sick of the main HDL languages. I just wish there were some consensus regarding the alternatives. Right now everyone seem to think that their language is the only sensible option. I guess time will tell though...
And if you are interested in knowing more about what we're doing in the OpenRISC project, I will be speaking at FPGA world in Stockholm and Copenhagen next month, and the yearly OpenRISC conference will be in Munich in October. Hope to see you there!
Regarding Chisel, we're all sick of the main HDL languages. I just wish there were some consensus regarding the alternatives. Right now everyone seem to think that their language is the only sensible option. I guess time will tell though...
I think that the fact that we're reaching this "pain point" is a good thing, as is the fact that there are now several credible alternatives to VHDL/Verilog. I may be biased since I'm also working on a new programming language to design hardware called Cx (formerly C~) :-) I do think that each alternative has its merits, and it is possible for several to cohabit, much like in software you have several languages, each with its own strengths and weaknesses depending on your application (embedded, networking, games, web applications, etc.)
What I'm hoping for is that these languages will allow people without an electronic engineer degree (or years of experience with VHDL/Verilog) to design hardware. I agree with Andreas and Paul, community and collaboration are very important. And what better way to increase collaboration than to make code easier to write, and more importantly, easier to read?
«"My main critisism of this however is that it has been done more or less behind closed doors, which unfortunately is far too common in academia, and therefore seems to suffer a lot from the NIH syndrome...You say you started in 2010 and yet we only hear about this now."
The RISC-V user spec v1.0 was published in 2011. It has since been cited at least 15 times, and we have presented work using RISC-V in a number of conferences and workshops. We also have at least 3 different external groups from around the world using RISC-V. They found out about it on their own»
I may have first found out about RISC-V through a comp.arch post by Brett Davis (Message-ID: <firstname.lastname@example.org>; 2 Dec 2011). However, I could not find any way (other than contacting authors of papers) to offer suggestions or ask about rationales. I think only recently have people associated with RISC-V (and then only secondarily, i.e., non-Berkeley people) posted on comp.arch and that more as project announcements. (I realize comp.arch has suffered along with the decline of the rest of USENET, but I would not have thought posting an RFC there would have cost that much effort.)
A major project without a mailing list (seemingly only started 7 August 2014) or even a wiki does not seem very open to me. (It looks like the github repository was started in July 2013, but that is really only useful to hardware designers and even then requires a willingness to use Chisel. The number of people that can propose useful architectural and microarchitectural features is many times larger than the number of people that can use an HDL. I think I am among that number, but I was excluded—in effect, not intent—from contributing. I realize ideas are cheap and filtering out the dross consumes resources, but even a small contribution to the 10% that is inspiration would seem to be useful.)
Even if it was only the doors to the world outside academia that were closed (and by no means locked), this still seems a less than ideal approach for an open project, even if platform projects benefit from more of a cathedral than a bazaar design orientation. (A good cathedral design has one architect [rf. Ch. 4, Mythical Man-Month] but many builders and even broader community involvement. I believe even a poor villager who can only give honor to the builders contributes to the building: "To praise good actions heartily is in some measure to take part in them." [Francois Duc De La Rochefoucauld, Reflections; or Sentences and Moral Maxims, 432].)
Krste created an email list based of interested parties based on people who sent him email, which allowed people outside Berkeley to comment on the drafts, and many people did.
Note that our original goal was to make something for us to use, not for the world to use.It was only 6 months or so that we realized that there was external demand for use of RISC-V; people were using it based on material in from the websites of our classes, even through we hadn't released it yet!
Once we realized why people were trying to use RISC-V on their own, we decided to try to go to push it as a free, open standard, hence this blog and techncial reports.
There are still parts of user ISA that will be standard extensions of RISC-V that have not been completed, and we still haven't released the system specification.
If you might be interested in participating in the RISC-V community, please attend the first RISC-V workshop and boot camp. It's open to everyone, and will be held January 14-15, 2015 in Monterey, CA.
"Krste created an email list based of interested parties based on people who sent him email, which allowed people outside Berkeley to comment on the drafts, and many people did."
That explains the history.
"Note that our original goal was to make something for us to use, not for the world to use.It's only in the last 6-12 months or so that we realized that there was external demand for use of RISC-V; people were using it based on material in from the websites of our classes, even through we hadn't released it yet!"
That shows a surprising lack of foresight. Even as a teaching tool, it should have been obvious that collaboration among universities would be desired. (While DLX had the advantage of being introduced/used in a certain popular textbook ☺, it seemed to have significant attention given to it even though it was not that different from and was later replaced by MIPS.) The network effects for any kind of intellectual content are significant, so it should have been obvious that collaboration and more widespread use would provide significant benefits.
This also seems to show a lack of awareness of the existence of computer architecture hobbyists. There are non-professionals who create ISAs and develop higher-level microarchitectural concepts for fun! While this resource might not be easy to utilize (as areas and degrees of skill vary greatly), ignorance of its presence or dismissing its potential seems wrong. Even people who ask "stupid questions" can be useful for developing clear and extensive documentation.
Academia should be the place where collaboration among organizations is the default mode of operation. (I realize there are first-to-publish pressures for researchers and some universities are getting heavily involved in acquiring patents—monopolies on ideas—, but a project like RISC-V is less subject to such concerns.) If one university sees development of a teaching/research ISA as worth the effort, would it not be obvious that at least a few other universities would want to avoid redundant effort and benefit from network effects?
The commercial potential would be less obvious (though China's work on MIPS-based designs seems to foreshadow India's interest in RISC-V and OpenRISC has had some commercial use). However, if good designs were produced as a side effect of research effort and were made open source, it should be obvious that at least a few hardware developers would be interested.
Again, I do not mean to rant (and as a complete outsider I do not understand the tradeoffs in academic collaboration much less extending that collaboration more broadly), but the lack of effort to draw in people seems both surprising and disappointing.
Paul, I understand your frustration, but your understanding of motives and actions in academia is way off. The last thing on our minds was to keep it behind closed doors. Few academic architecture patents are worth anything (though a tiny number do hit the jackpot), and most schools refuse to patent anything, unless there's a licensee already lined up, because historically the patenting costs far outweigh the licensing returns. We joke about architecture academics fighting tooth and nail to have their ideas stolen so they will actually get used. We have shared RISC-V lecture slides and labs with a few other schools who asked to use the material.
When sending out some of the original RISC-V drafts to leading architects in both academia and industry, I worried about spamming busy important people with email about our pet project, so only sent to those I knew reasonably well or had chatted to about the project before. Most didn't reply, but we did get invaluable feedback from those who did. In addition, our ideas were regularly exposed to a large cross section of leading companies through our lab retreats.
We would have been accused of excessive hubris if we'd tried to make a big splash 2-3 years ago and called on the whole community to contribute with the state of everything back then (and I'm sure we'll be accused by some of hubris now). We only pushed things out now because we think we have enough of the important pieces in place, and have also developed the arguments of why a free ISA makes sense.
I agree hobbyists will be an important force in RISC-V development, but you have to understand the realities of our limited time. I used to read and contribute to comp.arch (and comp.sys.super etc.) regularly about twenty years ago, but once I got a real job, I stopped having time. I thought about posting something about RISC-V to comp.arch, but didn't think it right to post there if I couldn't make a commitment to monitor and followup on the same forum. I do occasionally peek in there to see what's going on, but the signal-to-noise ratio is much lower than the "good old days" and there only seems to be about a dozen total participants, so I simply can't justify spending the time to engage there. We have created email lists on riscv.org and those will be a more appropriate place for RISC-V development discussions.
Krste, thank you very much for explaining further. Project management (like economics) is a significant and potentially very interesting side aspect of computer architecture.
I am somewhat disappointed that academics did not seek collaboration. As I stated earlier, this saves time/effort in the longer run. For businesses, I receive the impression that even if individuals wanted to respond corporate policy may have prevented such; I recently read that some corporations filter incoming mail not merely to avoid wasting the highly valuable time of the skilled but to avoid accusations of idea stealing.
I can sympathize with not wanting to waste other's time and respect that honorable choice.
I am disappointed with the lack of signal on comp.arch—much of the noise is easy to filter if one is willing to sacrifice a little signal—, but there are not many places on the Internet to post or discuss thoughts on computer architecture.
Anyway, I apologize for whining in a public forum. (I am sending a longish email; I hope you are good at skimming as there might be some copper nuggets hiden in the wall of text. I am selfish, but I do not want to completely waste your time.)
We did share it with many universities, which was an interest that we expected; the surprise was there is interest outside academia, which we only realized 6 months ago when non-academics were download from our course website and then complained if we made changes.
I've got to say that I'm surprised that you believe academics don't open their processes to outsiders. We even put our software that is under development on publicly accessisble on GitHub, and will talk to anyone who is welling to let us give a talk.
I believe there are a few other parts of our field that are a bit more closed than academia
The OpenRISC 1000 architecture does however have several flaws. The biggest is perhaps code density, something that has already beeen pointed out. The other big ones that I can think of right now are the flag, interrupt vector layout and delay slots. Just fixing this would probably bring OpenRISC much closer to the feature list set by the RISC-V authors while allowing us to reuse most of the already available software.
"What is going on with this comment system? I'm keep getting 403 errors when I try to post something"
My guess based on your relatively high post frequency (recently) is that you are running into some kind of primitive spam filtering mechanism. Your "Rookie" status may (or may not) be a factor (i.e., I could see a filter considering long term involvement for the constraints placed on posting).
For fun, I did an quick analysis of the RISC-V ISA using the Epiphany archchitecture (another modern RISC machine) as a reference point. Adding a link to the summary here in case others might find it interesting/useful. (Too cumbersome to add the whole write-up as a comment to this thread)
Irrespective of the analysis of the relative merits and defects of the ISA, what is important now is to quickly come out with appropriately configured SoCs and low-cost evaluation/development boards along the lines of Rasperry Pi; along with a robustly engineered ecosystem, for under $50/-. This will quickly proliferate in academia and industry. Once the copmmunity starts designing actual products around them, the commercial microprocessor families are going to get a real run for their money!
RISC-V really looks like MIPS done right, it is a quite solid base, albeit not an original one.
I dream some other university had the resources to build a competing CISC ISA, a x86 without all the problems, but keeping the variable instruction lenght, complex addressing, read/modify/write instructions, memcopy instructions...
The SPARC standard and brand belongs to www.sparc.org, both the V8 and the 64bits V9 edition. The "SPARC architecture licence", one time cost is 99$. The situation of recent insturction set extensions by Sun/Oracle and Fujitsu is not clear though.
While playing with my own CPU designs, I decided to create yet another on-chip interface bus, as both Wishbone and AMBA AHB/APB looked terrible and unsuitable for my purposes. (like CPU ISAs, there are gazillions protocols like CoreConnect, Avalon, OCP...)
Is AMBA AXI good enough ? Do you work on a open ASIC bus protocol ?
RISC-V already has variable-length instructions and read/modify/write instructions (in the standard "A" extension). We are working on experimental memcpy instructions as part of a "virtual local store" subsystem. The vector units we design have some very complex addressing modes. So, RISC-V might be the CISC you're looking for.
Many people have asked about AMBA/AXI and though we haven't implemented it yet, it would be very straightforward to support given that our existing interconnect is based on simllar point-point links.
Krste & team, It is good to have free IP for a core but it comes late in the day. Looks like the cores (including their cache) occupy 10% of a modern SOC and the rest of it is an ever growing sea of IP blocks for everything from graphics to encryption. Such blocks may offload specific workloads at 100x the power efficiency of a general core and as the whole SOC is commonly power limited, that is making more and more sense to have much of the die idle until needed. It will be as unremarkable to have idle silicon as it will be to have software which is called only when needed. In some ways power has replaced clock speed as the limit to complexity metric which RISC was founded on. So the size and complexity of the SOC can sensibly balloon far beyond what RISC prescribed, as a network of cooperating specialists.
Would you agree, and how would you compare that to experimenting with reserving extentions of the instruction set? It seems like those extensions assume the functionality is under control of the core, but the modern trend seems to be separation and independence.
While 99% of the compute operations in an SoC might be done on specialized accelerators, 99% of the needed total lines of code run on the general-purpose processors. You need a capable CPU inside any SoC.
Unlike ARM, RISC-V has a slim base ISA on which to build programmable accelerators, many of which fetch and decode an instruction stream. Unlike Tensilica and ARC, the RISC-V base ISA was designed to extend up to powerful general-purpose systems.
It should be far easier to develop software for an SoC full of RISC-V general-purpose cores and RISC-V-based specialized accelerators, than one that has a smorgasbord of home-brewed ISAs everywhere.
Purely fixed-function (non-programmable) pipelines are also important in some places. We'd recommend Chisel for those (chisel.eecs.berkeley.edu).
The other IP blocks are increasingly programmable. In the mobile world the die is taken up with GPU, modem, DSPs. All heavily programmable but radically different style from the cores, optimized with in some cases decades of history. Server SOCs will add networking processors. There are a few things which might be "Chisel simple": crypto, compression. We will probably see growth in both kinds of functionality.
Perhaps RISC-V could make sense inside the smart accelerators but since the proposition driving them is often specialized data flow for minimal power, the instruction set "extension" is likely to be more significant than the classics, and freedom to pivot the instruction set around the needs of the data flow may trump benefits in reusing a proven ISA.
The RISC-V ISA looks good. I've written code generators for several RISC including MIPS and generally like the trade-offs you have chosen, they fit well with a modern short pipeline with multiple instruction dispatch and parallel execution units.