I haven't had a chance to check out Zynq, plus the eval boards are pretty expensive. There are some cheaper ones coming in 2014, so we'll see. Given that the Xcell Journal article in 2Q2011 claimed "a starting price below $15", the current chip price is still pretty high. I noticed at the zedboard.org site that the original Zedboard is still US$395 for up to 5, but higher quantities have an "anti-discount" which raises the price to US$495. The US$395 price includes "manufacturers' subsidies". This is JMO/YMMV, but this engineer who is leery of being an "early adopter" wonders whether a manufacturer is having yield problems.
I hope Xilinx learned lessons from the Virtex-II Pro, which had built-in PowerPC cores. We considered that chip at one time since we were using PowerPC SoCs, but we gagged on the price. We later went with an IBM/AMCC 405EP and a Spartan-IIE, with a 33 MHz 32-bit PCI bus between them. That was very cost-effective and worked out really well.
So I'd like to try Zynq some day, but I'm waiting for pricing to get a closer to $15.
@Sanjib,A: If tou go to LinkedIn FPGA group, Steve Leibson of Xilinx marketing has several postd about the Zynq. Adam Taylor has an OS running on one core and bare metal on the other -- I have seen nothing about a real app. Since the hard core on Zynq is not as fast as an ASIC, I guess they threw in a second to see if anyone could use it.
They also have the standard mix of ARM interfaces on Zynq. That may enable usage of some existing MCU tools.
The ARM is still a RISC but not implemented as a soft core so not real clear except you get the baggage of memory controller and 2 leveols of cache if you really want them.
For performance you probably go for optimized C compiler, But if there is a problem, it is probably not de-buggable.
Altera Forums has an SoC category where you might find what issues the users have.
1. VLE will be used but Berekely pulled back the VLE encoding to improve the implementation.
2. We went thru a massive exercise of comparing various ISAs and initially settled on the Power 2.07. But some of our collaborators had patent issues in the US and by the time RISC-V was stable so we switched. I can understand the concerns about the encoding but all extant ISAs have some tradeoff. I placed more emphasis on the superscalar performance that this encoding could achieve. Figure any minor ISA changes can be done in the course of 2014/2015. Actually the Complier folks in Cambridge also had significant inputs into the encoding. I also saw some extensive review feedback from MIT.
But overall, I tend not to worry too much about ISAs beyond a certain point. Doing a new ISA is out of the question. In any case compliers are lagging way behind in properly using existing ISAs as it were ! But I would like to add your concerns to our internal mailaing list, if I could trouble you to send me an email.
We do plan to add our own variants for the supervisor mode, SIMD/Vector and 128 bit variants.
For the simple 5 stage pipeline (in-order) we are able to do test sysnthesis on Synopsys DC at about 1.7Ghz on a 65 nm UMC library. So decode penalty is pretty low. Do plan to have justify our design decisons on the comp.arch list to see if our design choices. But that is probably at least 6 months away.
3. This effort is actually a formal Govt. Of India project to standardize ISA for critical applications. So it is more than an alternate source for RISC-V, it is a massive effort to develop a new family of processors with commensurate staffing. For a lot of Indian applications, this ISA will probably get mandated. So it is going to be a huge market. You are talking smart cards, energy meters, POS terminal, critical servers. But unlike other attempts at setting national standards, this one will be open, royalty/patent free and with commercial grade reference RTL. I am hoping SoCs in future will be priced based just on silicon area !
4. Supervisor arch. is one area where I foresee having variants. For our microkernel OS, a std Linux friendly MMU will not cut it. Need good support for zero cost domaon crossing - optmized protected call gate type mechanisms perhaps ?
5. Server variants will probably go directly to Hybrid memory Cubes. So Server CPUs will have just two main inyerfaces, SRIO and HMC. And if we can have the same SERDES/PHY for both (25 Gb/s per lane), we can have a single physical interface for all I/O and memory. Internal protocol engines can be switched to configure the ports as memory or I/O. want to extend this further by shifting the MMU to the HMC die, so the CPU will have only virtual memory. Clusters of CPUs sharing a HMC cube in this configutaion make for an interesting proposition. In such a system, the memory first has to boot first and then allocate virtual memory chunks to difeerent CPUs. Plan to prototype this with the new FPGAs that support HMC.
If there is anything in particular you would like to see implemented, do let me know.
It is great to see so many expert comments out of extensive experiences!! What is your opinion about the recently launched SoCs? For example the Cyclone SoC from Altera (which has an ARM core in built) or the Zynq from Xilinx. Anybody tried those? Any pros/cons?
To All: I would like to join in with a different perspective. There are already soft cores available for FPGAs. Their reputations boild down to "too big, too slow", the real point is that FPGA is not well suited for RISC implementation. FPGAs are ideal for a programmable design. No, I am quite serious and have a lot of experience, so I will try to explain.
The "back end" of the compiler process where the intermediate language is mapped to a RISC architecfure is the weak link. RISC uses many instructions with the assumptions that clock speeds can be infinite. FPGAs have lower clock speed than ASICs of the same generation. The solution is to reduce the number of cycles to execute HLL statements and to evaluate expressions.
Some strong points of FPGAs:
FPGAs have block memories with true dual port capability, practically unlimited interconnect, and 6 input LUTs that can evaluate incredibly complex Boolean expressions in a couple of levels of logic.
IBM used micro-code control for high end main-frames with great success and FPGAs have memory available.
Program control flow is done by evaluating relational expressions and choosing one of two execution paths. Very straight forward.
Expression evaluation operators require 2 operands that can be supplied from a dual port RAM if all cariables and constants are kept in that RAM.
The cycle time per operator is about the same as the typical cycle for the technology.
The operands are not loaded innto registers from memory with the result stored back into memory for expression evaluation. They are local.
The hardware design take a couple of hundred LUTs and 3 block RAMs.
The software is C#. A parser and control word builder that generates content for the control word memory and code that generates the operand memory content.
There are so many ISAs and variations available that if there were an ISA appropriate for FPGAs, chances are that it would exist already. Going off to design still another one is probably just a waste of time and effort.
Notice that there is no cache. Cache is there to hide some of the external program memory access time. External memory is only for transient system data that can be accessed as needed, probably via DMA to local memory,
Broadly there will be 5-6 micro architecture families, corresponding roughly to a Cortex - M4, Cortex A7, Cortex A53/57, Core i5/i7, Xeon 4-12 core and 64-100 core Xeon Phi type HPC. Instruction set is the Berkeley Risc-V.
It is neat that Berkeley's RISC-V is actually being used. Will the variable length encoding be used?
(I am disappointed about the instruction encoding, particularly with respect to supporting VLE. While the length indication encoding is similar to something I thought of [my thought was a slight modification--using two bits like RVC--of per-parcel end of instruction indicator bit, inspired by similar predecode bit per byte in some x86 implementations; RVC puts those bits in the first parcel], the placement of register fields is very different in 16-bit and 32-bit instruction formats. [A tiny side benefit of greater compatibility in 16-bit and 32-bit encodings could be greater similarity in placement within a parcel and bit pattern between the function field for R-type format and the opcode field for I-type format as well as similar placement with a parcel of opcode field bits for 16-bit formats and function field bits for 32-bit instructions.] The register field packing also works against a simple extension to 64 registers, which might be useful for FP/SIMD. [The alternate encoding that I found would probably not improve decode efficiency significantly, but even trivial weaknesses bother me when I think I could do better.])
(I tend to disagree with some of the other design choices for RISC-V, but I do not feel I understand the trade-offs even as little as I understand the trade-offs for instruction encoding.)
While RISC-V may not be perfect (even for its design goals--I would be tempted to sacrifice some conceptual and implementation simplicity for other benefits), ISA fragmentation has significant costs (even with highly similar, RISCy ISAs). After watching the lack of progress in the OpenRISC 2k project, reading that RISC-V will be used outside of Berkeley sounds encouraging.
It is precisely to adress such issues that we are developing a family of BSD licensed open source cores (royalty free and patent free) at IIT-Madras. If Operating systems and Compilers are open source, it is high time CPUs become too.
Broadly there will be 5-6 micro architecture families, corresponding roughly to a Cortex - M4, Cortex A7, Cortex A53/57, Core i5/i7, Xeon 4-12 core and 64-100 core Xeon Phi type HPC. Instruction set is the Berkeley Risc-V. Other than the low end parts, all others will be 64 bit. Experimental versions will have 128 bit support and security support similar to the UPenn-DARPA crash-safe, basically fat pointers and hardware capability support.
MMU is similar to Power isa 2.07 and will has fully hypervisor support. 4 level page tables for 64 bit with multi-level TLB, variable page size support and hardware page table walk. Plan to have experimental MMU with Virtual caches and single address space OS support.
We will provide full toolchain support, ISA smulators and support for Linux and our version of the L4 microkernel. We will provide ongoing support and bug-fixes but obviously cannot provide commercial grade support. But bugs will be fixes ASAP. We have a small army of students who are tasked to do that !
Hopefully someone will create a Redhat like model to support them. All cores will be validated on FPGAs and some will be silicon proven. Obviously this is a long drawn out effort and will take some time before all the cores come out. I am hoping we will get feedback and source contributions from CPU archiectects. This year I plan to release a base core for the 2 lowest end cores (5-8 stage pipleine, in and out of order, MMU, BP, L1/L2 cache, single core) Focus is on correctness rather than perf. but we have an extensive CPU arch. research program so in the longer run, I expect these cores to become class leading. Nothing less will suffice.
Of course there is a caveat ! The cores are written in Bluespec which means unless you have a Bluespec lic (free to univ.) you cannot play with the the source RTL. But we do provide the Bluespec code and the generated Verilog. So you can take the Verilog and run it through your favourtite tool chain. We plan to offer a version of all the cores using Chisel, Berkeley's open source alternative to Bluespec, once Chisel is mature.
We have already released source to our Serial rapidIO logical and transport layer at bitbucket.org/casl. So take a look. We use SRIO as our I/O interconnect (instead of PCIe) and also as our cache coherent CPU-CPU interconnect. Think of it as an open source alternative to QPI.
you could potentially add the VHDL to internally replicate an existing MCU for which compilers and assemblers already exist, but then you'd probably be in violation of someone's copyright, and most one-off projects aren't large enough to justify independently develop BOTH an instruction set and the support tools to develop the code with.
First a niggle: one would almost certainly not be violating copyright with an independently developed implementation since one does not generally have access to the HDL source code. You presumably meant that you would probably be in violation of someone's patent. There are two dangers here. 1) That your design violates a valid patent. For the kinds of ISAs and microarchitectures likely to be implemented in an FPGA, this is unlikely. 2) That you will be unjustly sued (or threatened to be sued) for patent violation. This seems unlikely. Even ARM, which has the FPGA-targeted Cortex-M1, would generally have little incentive to pursue implementers of its ISA for low-volume internal use. Even apart from ill-will generated by such actions, the benefit (perhaps a few would be frightened or compelled into licensing a core design) seems unlikely to justify the cost. (Trademarks are a different matter, but one does not need to claim that one implemented an ARM, MIPS, or other trademarked name brand core. Trademarks also lose their power if not enforced--pressuring companies to more aggressively pursue possible violators--; patents and copyright are valid independent of previous lack of active enforcement.)
I would also argue that producing an ISA definition is not that difficult when the ISA is simple (as approrpriate for an FPGA soft core) and similar to established ISAs. If one is willing to accept the limits of GNU tools, even the porting of such tools is not (from what little I have read) overwhelmingly difficult (again assuming a simple ISA similar to existing ones). Of course, it seems odd that one would bother creating a new ISA and implementing a core when there are already cores available for free (unless one considers such part of the fun of the project). (I suspect licenses for Nios II [Altera] and MicroBlaze [Xilinx] soft cores are not extremely expensive, but I have not looked into such.)
JeffL_2 wrote: The two biggest issues about developing with FPGAs I find is 1) the "core" voltage is some low non-standard value which you may have to provide at a fairly high current, and ESPECIALLY 2) sockets for these devices either do not exist or are two orders of magnitude more expensive than the device itself! (Not that current MCUs and MPUs are devoid if this issue either, for reasons that I still find inexplicable.)
1) You want the voltage to be as small as possible to save power, but not so small that you lose performance. Usually the voltages are standard values like 1.2V, 1.5V, and 1.8V, but generating a non-standard value is usually as simple as adding a couple of 1% resistors. Needing a lot of power-on in-rush current can be a pain -- especially at very low temperatures -- but I thing they're getting better at that.
2) Sockets are nice for 84-pin PLCC and smaller, but IMO aren't reliable for dense TQFPs and BGAs. The sockets expensive because very few people use them, because those who have tried them (like moi) have found that they're much more trouble than they're worth. If you want access to signals, add some high-density headers. As for MCUs and MPUs, manufacturers probably find that few customers come begging for larger packages. Plus, larger packages have longer internal wires (this applies to FPGAs as well), and those longer internal wires add inductance, leading to ground bounce and similar electrical problems.
JeffL_2 wrote: Also if you pick a device that's large enough you could potentially add the VHDL to internally replicate an existing MCU...
The FPGA implementation is going to be a lot slower than a custom-designed CPU. Plus, the logic cells needed to implement the CPU in an FPGA are probably going to cost a lot more than the CPU, and take more power. If you don't need much CPU performance, you can get by with a simple CPU and it's quite practical.
JeffL_2 wrote: Also I believe there's way too much "diversity" in the interfaces for these devices, something like the JTAG standard that caught hold for some MCUs isn't all that commonly used for FPGAs...
Most if not all current FPGAs have JTAG. Xilinx does an excellent job of documenting its JTAG instructions and data so you can do your own programming and debugging using a wide variety of JTAG host devices, including MCU GPIOs. I haven't looked closely at other vendors.
JeffL_2 wrote: I would NOT say that learning VHDL SHOULD be a problem since the benefits appear to outweigh the learning curve and it's very widely accepted (although I'm no expert in it yet either).
Personally, I prefer Verilog. Its C-base syntax is more concise than VHDL, which is based on Ada. Chacun a son goût (YMMV).
JeffL_2 wrote: There's also issues about understanding how a particular device architecture "maps" its resources and how to best "tweak" your design to fit those resources but that probably deserves an entirely different article.
This is indeed a problem with VHDL and Verilog. You have to write your source code carefully so that the synthesizer generates the hardware you really want, and if you don't get it right the synthesizer may do wildly unexpected things.