JeffL_2 wrote: The two biggest issues about developing with FPGAs I find is 1) the "core" voltage is some low non-standard value which you may have to provide at a fairly high current, and ESPECIALLY 2) sockets for these devices either do not exist or are two orders of magnitude more expensive than the device itself! (Not that current MCUs and MPUs are devoid if this issue either, for reasons that I still find inexplicable.)
1) You want the voltage to be as small as possible to save power, but not so small that you lose performance. Usually the voltages are standard values like 1.2V, 1.5V, and 1.8V, but generating a non-standard value is usually as simple as adding a couple of 1% resistors. Needing a lot of power-on in-rush current can be a pain -- especially at very low temperatures -- but I thing they're getting better at that.
2) Sockets are nice for 84-pin PLCC and smaller, but IMO aren't reliable for dense TQFPs and BGAs. The sockets expensive because very few people use them, because those who have tried them (like moi) have found that they're much more trouble than they're worth. If you want access to signals, add some high-density headers. As for MCUs and MPUs, manufacturers probably find that few customers come begging for larger packages. Plus, larger packages have longer internal wires (this applies to FPGAs as well), and those longer internal wires add inductance, leading to ground bounce and similar electrical problems.
JeffL_2 wrote: Also if you pick a device that's large enough you could potentially add the VHDL to internally replicate an existing MCU...
The FPGA implementation is going to be a lot slower than a custom-designed CPU. Plus, the logic cells needed to implement the CPU in an FPGA are probably going to cost a lot more than the CPU, and take more power. If you don't need much CPU performance, you can get by with a simple CPU and it's quite practical.
JeffL_2 wrote: Also I believe there's way too much "diversity" in the interfaces for these devices, something like the JTAG standard that caught hold for some MCUs isn't all that commonly used for FPGAs...
Most if not all current FPGAs have JTAG. Xilinx does an excellent job of documenting its JTAG instructions and data so you can do your own programming and debugging using a wide variety of JTAG host devices, including MCU GPIOs. I haven't looked closely at other vendors.
JeffL_2 wrote: I would NOT say that learning VHDL SHOULD be a problem since the benefits appear to outweigh the learning curve and it's very widely accepted (although I'm no expert in it yet either).
Personally, I prefer Verilog. Its C-base syntax is more concise than VHDL, which is based on Ada. Chacun a son goût (YMMV).
JeffL_2 wrote: There's also issues about understanding how a particular device architecture "maps" its resources and how to best "tweak" your design to fit those resources but that probably deserves an entirely different article.
This is indeed a problem with VHDL and Verilog. You have to write your source code carefully so that the synthesizer generates the hardware you really want, and if you don't get it right the synthesizer may do wildly unexpected things.
you could potentially add the VHDL to internally replicate an existing MCU for which compilers and assemblers already exist, but then you'd probably be in violation of someone's copyright, and most one-off projects aren't large enough to justify independently develop BOTH an instruction set and the support tools to develop the code with.
First a niggle: one would almost certainly not be violating copyright with an independently developed implementation since one does not generally have access to the HDL source code. You presumably meant that you would probably be in violation of someone's patent. There are two dangers here. 1) That your design violates a valid patent. For the kinds of ISAs and microarchitectures likely to be implemented in an FPGA, this is unlikely. 2) That you will be unjustly sued (or threatened to be sued) for patent violation. This seems unlikely. Even ARM, which has the FPGA-targeted Cortex-M1, would generally have little incentive to pursue implementers of its ISA for low-volume internal use. Even apart from ill-will generated by such actions, the benefit (perhaps a few would be frightened or compelled into licensing a core design) seems unlikely to justify the cost. (Trademarks are a different matter, but one does not need to claim that one implemented an ARM, MIPS, or other trademarked name brand core. Trademarks also lose their power if not enforced--pressuring companies to more aggressively pursue possible violators--; patents and copyright are valid independent of previous lack of active enforcement.)
I would also argue that producing an ISA definition is not that difficult when the ISA is simple (as approrpriate for an FPGA soft core) and similar to established ISAs. If one is willing to accept the limits of GNU tools, even the porting of such tools is not (from what little I have read) overwhelmingly difficult (again assuming a simple ISA similar to existing ones). Of course, it seems odd that one would bother creating a new ISA and implementing a core when there are already cores available for free (unless one considers such part of the fun of the project). (I suspect licenses for Nios II [Altera] and MicroBlaze [Xilinx] soft cores are not extremely expensive, but I have not looked into such.)
It is precisely to adress such issues that we are developing a family of BSD licensed open source cores (royalty free and patent free) at IIT-Madras. If Operating systems and Compilers are open source, it is high time CPUs become too.
Broadly there will be 5-6 micro architecture families, corresponding roughly to a Cortex - M4, Cortex A7, Cortex A53/57, Core i5/i7, Xeon 4-12 core and 64-100 core Xeon Phi type HPC. Instruction set is the Berkeley Risc-V. Other than the low end parts, all others will be 64 bit. Experimental versions will have 128 bit support and security support similar to the UPenn-DARPA crash-safe, basically fat pointers and hardware capability support.
MMU is similar to Power isa 2.07 and will has fully hypervisor support. 4 level page tables for 64 bit with multi-level TLB, variable page size support and hardware page table walk. Plan to have experimental MMU with Virtual caches and single address space OS support.
We will provide full toolchain support, ISA smulators and support for Linux and our version of the L4 microkernel. We will provide ongoing support and bug-fixes but obviously cannot provide commercial grade support. But bugs will be fixes ASAP. We have a small army of students who are tasked to do that !
Hopefully someone will create a Redhat like model to support them. All cores will be validated on FPGAs and some will be silicon proven. Obviously this is a long drawn out effort and will take some time before all the cores come out. I am hoping we will get feedback and source contributions from CPU archiectects. This year I plan to release a base core for the 2 lowest end cores (5-8 stage pipleine, in and out of order, MMU, BP, L1/L2 cache, single core) Focus is on correctness rather than perf. but we have an extensive CPU arch. research program so in the longer run, I expect these cores to become class leading. Nothing less will suffice.
Of course there is a caveat ! The cores are written in Bluespec which means unless you have a Bluespec lic (free to univ.) you cannot play with the the source RTL. But we do provide the Bluespec code and the generated Verilog. So you can take the Verilog and run it through your favourtite tool chain. We plan to offer a version of all the cores using Chisel, Berkeley's open source alternative to Bluespec, once Chisel is mature.
We have already released source to our Serial rapidIO logical and transport layer at bitbucket.org/casl. So take a look. We use SRIO as our I/O interconnect (instead of PCIe) and also as our cache coherent CPU-CPU interconnect. Think of it as an open source alternative to QPI.
Broadly there will be 5-6 micro architecture families, corresponding roughly to a Cortex - M4, Cortex A7, Cortex A53/57, Core i5/i7, Xeon 4-12 core and 64-100 core Xeon Phi type HPC. Instruction set is the Berkeley Risc-V.
It is neat that Berkeley's RISC-V is actually being used. Will the variable length encoding be used?
(I am disappointed about the instruction encoding, particularly with respect to supporting VLE. While the length indication encoding is similar to something I thought of [my thought was a slight modification--using two bits like RVC--of per-parcel end of instruction indicator bit, inspired by similar predecode bit per byte in some x86 implementations; RVC puts those bits in the first parcel], the placement of register fields is very different in 16-bit and 32-bit instruction formats. [A tiny side benefit of greater compatibility in 16-bit and 32-bit encodings could be greater similarity in placement within a parcel and bit pattern between the function field for R-type format and the opcode field for I-type format as well as similar placement with a parcel of opcode field bits for 16-bit formats and function field bits for 32-bit instructions.] The register field packing also works against a simple extension to 64 registers, which might be useful for FP/SIMD. [The alternate encoding that I found would probably not improve decode efficiency significantly, but even trivial weaknesses bother me when I think I could do better.])
(I tend to disagree with some of the other design choices for RISC-V, but I do not feel I understand the trade-offs even as little as I understand the trade-offs for instruction encoding.)
While RISC-V may not be perfect (even for its design goals--I would be tempted to sacrifice some conceptual and implementation simplicity for other benefits), ISA fragmentation has significant costs (even with highly similar, RISCy ISAs). After watching the lack of progress in the OpenRISC 2k project, reading that RISC-V will be used outside of Berkeley sounds encouraging.
1. VLE will be used but Berekely pulled back the VLE encoding to improve the implementation.
2. We went thru a massive exercise of comparing various ISAs and initially settled on the Power 2.07. But some of our collaborators had patent issues in the US and by the time RISC-V was stable so we switched. I can understand the concerns about the encoding but all extant ISAs have some tradeoff. I placed more emphasis on the superscalar performance that this encoding could achieve. Figure any minor ISA changes can be done in the course of 2014/2015. Actually the Complier folks in Cambridge also had significant inputs into the encoding. I also saw some extensive review feedback from MIT.
But overall, I tend not to worry too much about ISAs beyond a certain point. Doing a new ISA is out of the question. In any case compliers are lagging way behind in properly using existing ISAs as it were ! But I would like to add your concerns to our internal mailaing list, if I could trouble you to send me an email.
We do plan to add our own variants for the supervisor mode, SIMD/Vector and 128 bit variants.
For the simple 5 stage pipeline (in-order) we are able to do test sysnthesis on Synopsys DC at about 1.7Ghz on a 65 nm UMC library. So decode penalty is pretty low. Do plan to have justify our design decisons on the comp.arch list to see if our design choices. But that is probably at least 6 months away.
3. This effort is actually a formal Govt. Of India project to standardize ISA for critical applications. So it is more than an alternate source for RISC-V, it is a massive effort to develop a new family of processors with commensurate staffing. For a lot of Indian applications, this ISA will probably get mandated. So it is going to be a huge market. You are talking smart cards, energy meters, POS terminal, critical servers. But unlike other attempts at setting national standards, this one will be open, royalty/patent free and with commercial grade reference RTL. I am hoping SoCs in future will be priced based just on silicon area !
4. Supervisor arch. is one area where I foresee having variants. For our microkernel OS, a std Linux friendly MMU will not cut it. Need good support for zero cost domaon crossing - optmized protected call gate type mechanisms perhaps ?
5. Server variants will probably go directly to Hybrid memory Cubes. So Server CPUs will have just two main inyerfaces, SRIO and HMC. And if we can have the same SERDES/PHY for both (25 Gb/s per lane), we can have a single physical interface for all I/O and memory. Internal protocol engines can be switched to configure the ports as memory or I/O. want to extend this further by shifting the MMU to the HMC die, so the CPU will have only virtual memory. Clusters of CPUs sharing a HMC cube in this configutaion make for an interesting proposition. In such a system, the memory first has to boot first and then allocate virtual memory chunks to difeerent CPUs. Plan to prototype this with the new FPGAs that support HMC.
If there is anything in particular you would like to see implemented, do let me know.
Profiles here do not seem to provide email addresses. My gmail.com address is 'paaronclayton', so you could either email me there or post your mildly obsfucated (to avoid spam) email address in a reply here.
I did not realize that Berkeley "pulled back the VLE".
I am surprised that compiler people would have much input on encoding, though the commonness of JIT compilation could justify more software-friendly encoding. (For ahead-of-time compilation, binary generation overhead would not be important in complexity or speed for any somewhat reasonable encodings. Compiler people and software people in general might have useful input in what instructions are offered, which instructions are short, etc.; but where semantic fields are placed within an instruction would not seem to be a big deal for compilers. However, that statement may just express my ignorance.)
It does sound like an interesting project. It is a bit sad that government involvement is necessary, but "If men were angels, no government would be necessary."
I look forward to emailing you some of my thoughts (more than requested or desired most likely! [I have an affection for computer architecture, as evidenced by Internet activity--comp.arch newsgroup, Real World Technologies forum, Stack Exchange, etc.]) when I get your email address.
To All: I would like to join in with a different perspective. There are already soft cores available for FPGAs. Their reputations boild down to "too big, too slow", the real point is that FPGA is not well suited for RISC implementation. FPGAs are ideal for a programmable design. No, I am quite serious and have a lot of experience, so I will try to explain.
The "back end" of the compiler process where the intermediate language is mapped to a RISC architecfure is the weak link. RISC uses many instructions with the assumptions that clock speeds can be infinite. FPGAs have lower clock speed than ASICs of the same generation. The solution is to reduce the number of cycles to execute HLL statements and to evaluate expressions.
Some strong points of FPGAs:
FPGAs have block memories with true dual port capability, practically unlimited interconnect, and 6 input LUTs that can evaluate incredibly complex Boolean expressions in a couple of levels of logic.
IBM used micro-code control for high end main-frames with great success and FPGAs have memory available.
Program control flow is done by evaluating relational expressions and choosing one of two execution paths. Very straight forward.
Expression evaluation operators require 2 operands that can be supplied from a dual port RAM if all cariables and constants are kept in that RAM.
The cycle time per operator is about the same as the typical cycle for the technology.
The operands are not loaded innto registers from memory with the result stored back into memory for expression evaluation. They are local.
The hardware design take a couple of hundred LUTs and 3 block RAMs.
The software is C#. A parser and control word builder that generates content for the control word memory and code that generates the operand memory content.
There are so many ISAs and variations available that if there were an ISA appropriate for FPGAs, chances are that it would exist already. Going off to design still another one is probably just a waste of time and effort.
Notice that there is no cache. Cache is there to hide some of the external program memory access time. External memory is only for transient system data that can be accessed as needed, probably via DMA to local memory,
It is great to see so many expert comments out of extensive experiences!! What is your opinion about the recently launched SoCs? For example the Cyclone SoC from Altera (which has an ARM core in built) or the Zynq from Xilinx. Anybody tried those? Any pros/cons?
@Sanjib,A: If tou go to LinkedIn FPGA group, Steve Leibson of Xilinx marketing has several postd about the Zynq. Adam Taylor has an OS running on one core and bare metal on the other -- I have seen nothing about a real app. Since the hard core on Zynq is not as fast as an ASIC, I guess they threw in a second to see if anyone could use it.
They also have the standard mix of ARM interfaces on Zynq. That may enable usage of some existing MCU tools.
The ARM is still a RISC but not implemented as a soft core so not real clear except you get the baggage of memory controller and 2 leveols of cache if you really want them.
For performance you probably go for optimized C compiler, But if there is a problem, it is probably not de-buggable.
Altera Forums has an SoC category where you might find what issues the users have.
I haven't had a chance to check out Zynq, plus the eval boards are pretty expensive. There are some cheaper ones coming in 2014, so we'll see. Given that the Xcell Journal article in 2Q2011 claimed "a starting price below $15", the current chip price is still pretty high. I noticed at the zedboard.org site that the original Zedboard is still US$395 for up to 5, but higher quantities have an "anti-discount" which raises the price to US$495. The US$395 price includes "manufacturers' subsidies". This is JMO/YMMV, but this engineer who is leery of being an "early adopter" wonders whether a manufacturer is having yield problems.
I hope Xilinx learned lessons from the Virtex-II Pro, which had built-in PowerPC cores. We considered that chip at one time since we were using PowerPC SoCs, but we gagged on the price. We later went with an IBM/AMCC 405EP and a Spartan-IIE, with a 33 MHz 32-bit PCI bus between them. That was very cost-effective and worked out really well.
So I'd like to try Zynq some day, but I'm waiting for pricing to get a closer to $15.
"I hope Xilinx learned lessons from the Virtex-II Pro, which had built-in PowerPC cores. We considered that chip at one time since we were using PowerPC SoCs, but we gagged on the price. We later went with an IBM/AMCC 405EP and a Spartan-IIE, with a 33 MHz 32-bit PCI bus between them."
We did a bunch of designs with the Virtex-4 FX with built-in PPC, and we had the same experience. The chips were more expensive than a standalone PPC and a smaller FPGA next to it, and the savings in board space going with the V4FX wasn't all that much. We didn't do PCI between the PPC and FPGA, we just did the processor local bus.
And all of that cost was part of the equation. The other part was the continually-"evolving" toolset. Let's go from OPB to PLB to AHB, making the customer rewrite the same custom cores THREE times. Let's provide REALLY AWFUL cores with the tools. And so on.
I have been writing on how to use the Zynq using the Planahead flow over on the now sadly defunct All Programmable Planet, this series of blogs will be reposted on EE Times starting this week I hope. They cover everything from getting it creating the project to adding peripherals, creating your own perihperal and adding a RTOS. I have also written a similar series for the Xilinx Xcell Journal starting with the January 2013 issue.
There is also a series of blogs over on Xcell daily blog that has focused upon using the Vivado flow. This again starts at project creation, adding perihperals, but has then looked at things like the interrupt structre, clocks and timers, bootloaders etc.
To be honest I find Vivado very easy to use, but then planahead / ISE is too.
The writing I have been doing over on Xcell Daily has focused more on the software side at the momenr so interupts, timers, watchdogs etc which is SDK based but it is important for people to understand how this all works and is intergrated trogether.
I have just emailed you something which I hope you will find useful
"To be honest, until recently I still had questions like 'Why should one consider FPGA technology when we have well-known microcontrollers available to us'?"
The answer is quite obvious, actually: it's because generally, FPGAs and microcontrollers/microprocessors solve different problems.
The question our blogger asked is generally asked by folks coming in from a host software background, or perhaps from a embedded micro background.
A processor presents a sequential instruction execution model, limited I/O, and a straightforward programming model which allows the engineer to write programs in C or other language. A microcontroller's peripherals and pin-outs are limited to what the chip designers thought would be a good balance between cost and flexibility.
An FPGA is a platform for general digital logic design. The engineer can implement whatever is necessary to meet the design goals. The designer isn't limited to what peripherals were chosen by the chip vendor, so if you need nine SPI ports, each with different word lengths, you design them and add them to the larger system. If you need a wacky ping-pong assymetric memory interface to handle sensor data, you design exactly what you need. Pinouts are limited by the device choice; if you need more pins or more logic, you choose a larger device.
Of course, the flexibility of the FPGA comes at a cost, or at least with a requirement: you have to be a digital design engineer. It is not like writing firmware for a microcontroller. It is a different skill set. This needs to be understood.
Now of course the FPGA vendors are embracing High-Level Synthesis and "C to gates" methodologies as a way to "ease" the transition from high-level software and algorithm acceleration into hardware. But for mainstream logic design, where one really does need to be concerned about the cost of the FPGA device (as part of the overall BOM), HLS is still a non-starter.
And then there are the Systems-on-a-Chip, which embed a processor hard core with some amount of FPGA fabric. The intent here is a good one, since some applications require both a processor to handle stuff like an Ethernet or USB interface to a host, combined with the custom logic that is necessary to implement the design. Does every design require an SoC? Certainly not, but it's a useful tool for a designer to have (assuming the vendor design tools don't totally suck eggs).
If you need high algorithm performance or low latency, go with the FPGA.
If it's a complex function, go with the CPU.
A CPU is a cheap way to implement complex functionality, because the cost of functions is very low. So if your CPU is fast enough and meets your real-time latency requirements, go for the CPU. OTOH, the CPU can only do one operation at a time -- or a few if multiple cores and/or multiple issue, assuming your software can take advantage of them -- so performance is limited.
An FPGA can do many operations at the same time, so if you have a high-performance DSP algorithm (e.g., image processing or beam forming) or something similar, you can blast through a lot more data than a CPU, plus you can transfer data directly from block to block instead of having to transfer through memory. OTOH, you have to pay for those blocks so you don't want to do it that way unless it's really necessary.
The other great thing you get from an FPGA is extremely low latency, so if you have real-time data -- e.g., a specialized communications protocol -- you can respond immediately to your inputs. Most CPUs cannot respond quickly to inputs, especially if you're running an OS, so if you need low latency an FPGA can be a great way to do it. Often these applications don't need a lot of logic -- they just need the logic to be dedicated to the inputs. So you can get by with a cheap FPGA.
"1) the "core" voltage is some low nonstandard value which you may have to provide at a fairly high current depending on the size of the device"
The voltage and current requirements are what they need to be to ensure a balance between speed and power consumption.
Besides, have you looked at the power supply requirements for an Intel Core processor?Pro tip: it's not a 5V-only part.
Another Pro tip: there are these things called voltage regulators, which use simple resistive dividers to set the output voltage to pretty much whatever you want. The actual voltage required by the FPGA doesn't matter. Plop down the right regulator and move on. "2) sockets for these devices either do not exist or are two orders of magnitude more expensive than the device itself! (Not that current MCUs and MPUs are devoid if this issue either, for reasons that I still find inexplicable.)"
SOCKET? Nobody ships a product with a socket. The last time I put a socket in something was when we still used UV-erasable EPROMs. There is no reason to use a socket in this age of BGA packages and in-system-programmable flash memory. (OK, if you develop with OTP FPGAs for space applications, you might put a socket on the prototype boards.) "you could potentially add the VHDL to internally replicate an existing MCU for which compilers and assemblers already exist, but then you'd probably be in violation of someone's copyright"
If you embedded an 8051 in your FPGA, and didn't tell anyone, would anyone know? "something like the JTAG standard that caught hold for some MCUs isn't all that commonly used for FPGAs,"
Are you kidding? I'm hard-pressed to thing of a mainstream FPGA or CPLD family which doesn't have JTAG! "If the hardware aspects of the tools were sufficiently standardized that you KNEW you could use the same interface over the next 3-5 projects, and could amortize the tool costs against that, it would help a lot."
All of the FPGA vendors sell a USB-to-JTAG programming dongle for at most a couple hundred bucks. I still use the same Xilinx dongle the company bought for me almost nine years ago when I started the job. (OK, I have two more of them.) If you're doing FPGA design for a living and you can't afford the dongle (and the rest of the tool set is $FREE), then I don't know what to say. "I would NOT say that learning VHDL SHOULD be a problem since the benefits appear to outweigh the learning curve and it's very widely accepted (although I'm no expert in it yet either)."
Well, seeing as how VHDL (and Verilog) are the two primary FPGA design-entry methods (for both synthesis AND simulation -- you DO simulate and verify your designs, right?), then I'd say you MUST learn an HDL. It is simply not an option. Schematics? Don't go there.
JeffL_2 wrote: I'm looking at data sheets here for Atmel and Lattice devices, and I see no evidence that either incorporates support for JTAG.
I don't know what devices you're looking at. If they're "jellybean" PLDs like 22V10s, then yes, those don't have JTAG: it's C-as-in-complex-PLDs and FPGAs that have JTAG. FPGAs and CPLDs use JTAG for its traditional purpose -- boundary scan for testing boards and chips -- and for programming the FPGA or CPLD. Xilinx also lets you read and write register bits, and create your own internal scan paths using logic cells. As I said upstream, Xilinx provides you and third parties the information to do this, but I don't know if they provide any software.
Most people debug FPGA designs using simulation (if they don't need to run real-time) or by routing internal signals to spare pins connected to test points so they can be 'scoped in real time. I mostly use the latter, using actual prototype hardware. You can also get FPGA development boards in various price ranges which provide more convenient access to signals. You can also get modules with DIP and PLCC pinouts for prototyping or small-scale production, but I wouldn't call them cheap.
Xilinx has something called ChipScope which (if I understand it correctly) makes it easy to view internal signals. I don't know how it uses JTAG. I've never used ChipScope, mostly because it isn't included with the free WebPACK version of the software. So I just include test points in my Verilog source code and modify the test point equations to look at different signals. The important thing is to have large enough test point holes in the PC board so that your probes stay put while you're running the test.
"Xilinx has something called ChipScope which (if I understand it correctly) makes it easy to view internal signals. I don't know how it uses JTAG."
ChipScope is simply a logic analzyer that is embedded in the FPGA. It uses BRAMs for sample storage. You use the Core Inserter to select which signals you want to analyze, just like when you hooked up the pods of your HP1660 to a board. Those signals are sampled on the same clock which they run on; in other words, signals are sampled synchronously. You can set it to trigger on events and patterns and pretty much everything you'd do on your 1660.
JTAG is used as the transport between the FPGA and the host computer, because it's convenient and always available. The ChipScope analyzer software runs on the host PC and manages all of that, and basically it presents the logic-analyzer interrface most of us are familiar with.
if you do FPGA design for a living, the cost of the ChipScope license (or the cost of Altera's equivalent) is well worth it.
It's not at all a substitute for simiulation at the HDL level, but when something isn't working, it's good to have it.
There's some hefty intellectual argument going on here. As a lew-level starter, I can can tell you what started me out: Software Defined Radio [SDR] high speed ADC's and the USB interface.
High speed USB is rated at 480Mbps, but you're lucky to get 300Mbps. A 12-bit ADC running at 120MHz produces 1.44Gbps so a little massaging is in order, along with learning about DSP...
Downsampling? Take 12 samples, multiply each by a twiddle factor, and produce a 16-bit output sample? That should be 160Mbps and slow enough to transfer.
But - multiplying is hard. Taking those input samples and multiplying them takes time, and all the while new samples are coming in. A simple CPU is flat out, just multiplying and moving data around. Wouldn't it be nice if I could just write factor to 12 memory locations, have a DMA point the incoming data at another 12 locations - and a single value pops out after every 12 samples? I could call it a "decimating FIR filter"... and if I want one, I need a DSP chip or an FPGA.
The worst [or best] part came when I read about using Fourier transforms [transform input, select a band of interest, translate to a lower frequency band, then do an inverse transform]. The first step in transforming is re-ordering samples. I looked at the code for this, and thought: "DMA transfer, use an incrementing counter, but flip the counter bits before they go into the DMA".
Even if it doesn't work [time will tell] the fact I'm think in terms of counters and bit order means the journey has begun. There's no turning back now.
@sa_penguin: Sounds like you could us embedded RAM blocks and do the whole thing in real time on the FPGA. This would not require external memory, controller, DMA, etc. And the multipliers on the FPGA are very fast.
@KarlS01: What - you see my firast steps into adding an FPGA to a circuit, and you want me to do the whole thing in a single [larger] FPGA? Ewww, and Errrk.
Ewww: Remove a functional low pin-count CPU, with Hi speed USB support [Atmel SAM3U series] and add a 256+ pin BGA that costs more, needs to be configured, and needs a 4 [or 6] layer board just to get to all the pins?
Errk: Those big chips aren't cheap. Come to think of it, neither are 4+ layer boards.
No, I think us raw beginners need to stick with pre-made boards when possible. If you need do something fancy [like my ADC] keep it as small and simple as possible, then feed it to an established system. Cheaper, easier to configure / faultfind. If I could get a Logi-bone [see kickstarter], strip out all the PMOD connectors and have even a single [fast] ADC channel like the Red Pitaya, I'd have plenty to play with - on a Beaglebone.
Side note: @anon7632755, yes high speed USB is always 480 mbps. IF the host controller isn't busy, [mouse, keyboard etc. all eat USB bandwidth] it might even all happen my one device. That still leaves the overhead of a packetized protocol; Header, CRC and SOF (Start Of Frame) data all eat into actual throughput. There are a few papers at IEEE talking about practical throughput, they give 300 - 305Mbps as a limit before you start losing data. Would you like me to look up the reference for you?
Penguin: "Side note: @anon7632755, yes high speed USB is always 480 mbps. IF the host controller isn't busy, [mouse, keyboard etc. all eat USB bandwidth] it might even all happen my one device. That still leaves the overhead of a packetized protocol; Header, CRC and SOF (Start Of Frame) data all eat into actual throughput. There are a few papers at IEEE talking about practical throughput, they give 300 - 305Mbps as a limit before you start losing data. Would you like me to look up the reference for you?"
I happen to have the USB 2.0 spec right here.
But again, you're missing the point. The wire speed is fixed. Bits toggle on it at 480 MHz.
The data throughput varies for a bunch of different reasons, notably bus utitlization, different transaction types, host buffering and drivers and all of that. Even the hub type makes a difference (look up Single Transaction Translator vs Multiple Transaction Translator when merging multiple full speed endpoints onto a high speed bus in a hub).
And when you say "start losing data," that doesn't happen because of the wire rate. It happens because a host or device can't sink or source the data promised.
@anon7632755: OK, now I'm confused. Judging by some of the other posts here, there's at least one othe rperson who's getting worked up, but I'll settle for "confused".
I never denied USB2 works at 480 MHz, or 480 Mbps. What I DID say, way back in my first post, is that I need a simple FPGA to downsample a 120MHz x14-bit ADC, in order to effectively use a USB2 connection. I also pointed out that the 480Mbps is an ideal - once you take overhead into consideration, a more achievable goal for ACTUAL DATA TRANSFER is 300Mbps.
When the data is downsampled, it goes to a "buffer" and is read by [whatever is handling the USB part]. If the downsampling is inadequate, you risk writing new data faster than old data gets cleared, resulting in buffer overflow [lost data].
The confusing thing is - you seem to agree with this. Yet you seem to be harping on about 480MHz, and I don't know why. You take my attempt to reduce the data bandwidth, and deride it as "you don't understand USB".
I have to wonder if I'm being baited to respond. If so, judging by the forum threads I'm not the only one. I hesitate to use the term "Troll" but you sir/madam, are making some confusing / inflammatory posts.
If you are doing a FIR with constant coëfficients in an FPGA, check out "distributed arithmetic". It replaces multipliers with look-up tables and adders. Amazing technique, "indistinguishable from magic".
If those constant coefficients are truly constant (not programmable), check out Canonic Signed Digit (CSD) arithmetic, which also replaces multipliers with adders & subtractors. You're essentially hard-wiring those look-up tables into the logic that decides whether to add, subtract or do nothing, based on the non-zero bits of the CSD data.
And speaking of FIR filters, don't forget to exploit coefficient symmetry. That probably goes without saying, but that mistake still sometimes gets made and results in FIR designs doing twice as many multiplications than are actually neccessary.
The Atmel devices are certainly not mainstream -- I think they bought someone else's deprecated line. Lattice FPGAs use JTAG, not their small PLDs. Xilinx, Altera, MicroSemi all use JTAG.
No, I don't have a bias towards Xilinx. I don't know why you think that. "Also having a group of pins labeled 'JTAG' on a pinout doesn't REALLY mean the vendor really supports that interface for doing anything useful, and whatever it does represent it CERTAINLY doesn't hope to imply that a person could just plop down a few bucks on a Macraigor Wiggler or Demon or Raven and start debugging code, like you've been able to do for DECADES with standard embedded MPUs, in fact for FPGAs it doesn't really imply ANYTHING."
Honestly, you don't know what you're talking about. OK, so the FPGA vendors don't support the Wigglers and the software used by them, so you have to buy the NOT EXPENSIVE (as in "a few bucks") dongle from the vendor. Honestly, I don't see why this is a problem.
And considering that I use JTAG to program and "Debug" FPGAs every working day, I don't understand what you're on about. Maybe if your experience with the devices was practical and not theoretical, you'd understand. As for proto setups in the lab, in the Real World, where professionals do this for a living, part of the prototype is a board spin. And when we build prototypes, we don't build just one, we usually do at least three first-article boards. This way, if that lightning strike happens, we are not screwed.
And you mentioned "I'll agree over something close to 144 pins ..." which tells me you haven't built anything very complicated. Let me put it this way: when your design has 16 100-MHz/16-bit ADCs that need to be read and their data processed and then passed on to a host, then you'll quickly realize that a 144-pin QFP FPGA isn't an option.
And this product line doesn't have 100K-unit production runs, so I don't know why that's even relevant.
You keep reading the FPGA vendor literature and YES, they all push these super-huge devices of which they sell a dozen a year to specialist customers. And the reality, which we keep telling the FAEs, is that the Spartan 3AN-50 (which is in a QFP) and 200 are excellent jellybean devices for a whole bunch of varied applications. We don't need the big devices, we don't care about them, we don't want to hear about them.
Now one thing the small company can do to minimize costs is to settle on a jellybean FPGA. Is the 200AN "too big" for a lot of things? Probably. But if you can buy them by the tray instead of individual piece parts, the price becomes somewhat less of an issue.
"In the FPGA world you get maybe 20-30 pages to describe an entire FAMILY of parts, with lots of "weasal words" like "the lower end of the family doesn't support the entire complement of tools" and if you need to know more you just contact your friendly local tech representative."
That is most certainly NOT TRUE. Let's use Xilinx Spartan-6 devices as an example. Go here for the list of S6 user guides and other documents. There are THOUSANDS of pages of documentation for this device family. And it's conveniently broken down into functional sections, such as the clocking infrastructure, BRAMs, the I/O stuff including the SERDES, and the DSP block.
Really, the documentation is quite comprehensive. (I have some particular nits to pick but that's not relevant here.) "In other words these guys figure they're mostly not just selling to but DOCUMENTING to a "Xilinx shop" or an "Altera shop", they're just telling you there's some parts on which you can't use the entire complement of tools YOU'VE PROBABLY ALREADY BOUGHT."
(Again for Xlinx, Altera and MicroSemi are similar) most users can get work done using the WebPack. The only parts not supported by WebPack are the super-duper big devices that the vast majority of users won't even consider.
Of course, why if I buy ARM tools from NXP? Those tools don't support a Silicon Labs device. "Go back and look at the original premise of the article, it's to introduce the premise of development with FPGAs to people largely unfamiliar with them, and particularly to compare it with other types of development the reader may already be familiar with."
And the premise is somewhat misleading, because as I've noted elsewhere in this thread, FPGAs and processors are intended to solve different problems! The FPGA designer needs a serious background in digital logic design and doesn't need to know anything about C or assembly-language programming.
The processor firmware guy doesn't need to really know anything at all about digital electronics. He doesn't need to know about creating and meeting timing constraints. He doesn't need to know about crossing clock domains. He doesn't need to know about power consumption. He doesn't need to know about transmission lines on PCBs.
So, really, the two disciplines are for the most part entirely separate. "For you to imply "no one who knows what they're doing could possibly think of using brand X in a design because it's not 'mainstream' enough for me to use" only reinforces this attitude and does nothing to clarify anything about the original issue."
That is not at all what I said. You simply don't understand what is meant by "mainstream." AGAiN, ALL of the FPGA vendors have settled on JTAG as a standard way of programming and debugging on their devices. Call your local Lattice guy and ask. Call the MicroSemi guy and ask. Or, better yet, spend five minutes on each vendor web site. It's all there.
FPGAs which are "outside of the mainstream" are the space-grade things from Aeroflex and MicroSemi, which I would guess you'll never use, and anyways if you're spending $15,000 on a single piece of FPGA, then you're playing in a different game. QuickLogic with their OTP devices are also in this category. They're "outside the mainstream" because they're intended for specialist applications. "The very last project I worked on was certifying the software for a safety-critical subsystem containing 10 PowerPCs and 8 DSPs scheduled in 1-msec intervals which communicated over military Firewire in real-time. At least the hardware people who devised this monster had the presence of mind NOT to try and stuff all of this into a couple of SoCs! (The "glue logic" that held them together however WAS stuffed into FPGAs, although not by me.)"
So, in other words, you know a little bit about software, but you're not an actual developer (you "certify the software," which means you push paper), and you know nothing about FPGAs, and then you have the balls to say to people who've been doing FPGA design for 20 years that our opinions and experience are wrong. Please, give us all a break.
When you've got a handful of FPGA designs under your belt, where you've done the design, written the HDL, verified it in simulation, built boards and shipped product, then perhaps we'll take your opinions seriously.
Gentlemen (I assume you are all male, due to the high testosterone level vibes :)
I am new to electronics as a serious hobby, and would appreciate it if, when things get stormy, post-wise, you would take the acrimony offline. I have neither the time nor the inclination to wade through 'my ego is bigger than your ego' posts.
I want to apologize for the awful excesses to which you were exposed about which you were objecting on this blog. I was the first person to make a comment on getting started with FPGAs because it was a topic I was looking into myself, and I have certainly "done my share of embedded development" and specifically NOT using FPGAs (at least not at the current level of development) so I thought my "perspective" might possibly be useful in the discussion. It turned out that no one who was currently working at the state of the art in this field wanted to "waste their time with newbies and rookies" so they wouldn't even THINK of exchanging casual thoughts about the subject lest they reveal precious secrets about the internal workings of the "priesthood" to which they belong. You have to understand that in their world only the "blessed" are the ones to whom the arcane and mystical secrets of the inner workings of Vivado are revealed, and it is only the lesser of us mortals who are condemned to scratch out a subsistence existence with pathetic remnants of the art like ISE (whatever either of those are, I really couldn't care less). I would however point out about these "wizards" that I strongly feel that if the relationship in the following article by Martin Rowe is correct, they must certainly be working for free:
In deference to these folks and their expertise, to the extent it's possible I'll delete my previous posts on this subject, so in my absence we can see just how modest and unassuming these folks truly are. I apologize for thinking one of the "unanointed" like myself was worthy of commenting on this topic, so please accept my humble apologies for kicking off this blog. The calumny and viciousness you've observed would certainly not have occurred had it not been for my actions in the first place.
To start with: "If you've been doing the same job in the same industry for 20 years then you've proven you're not worth promoting." What, so you think that someone can't develop a valuable skill and continue doing it for most of a career? Why is that a bad thing?
And what do you mean, "same job in the same industry?" How do you know what I"ve done over the course of my career, and for whom?
I like what I do, and the boss values the work and we ship product to happy customers and everyone is happy. When previous bosses and companies no longer valued the contribution, I left and found employment where such work is valued. So, no, I haven't been locked up in the same lab for 20 years. (Believe me, I'm not your stereotypical engineer.)
So, really, you're just being silly. Or, have you not always been a compliance engineer?
Anyways: You're NOT a design engineer. You DON'T do FPGA design for a living. Nor do you do embedded firmware (which I do, too). You don't know what sort of stuff I work on, and to say that I'm "making up the specifications as I go along" is patently absurd.
This thread (and the entire "Programmable Planet" section here) is about FPGA design. It's about techniques and skills and tools. You have absolutely no experience with any of this. So how can you have opinions when you have no information upon which to base them? Go back. Re-read your posts. You've written a lot of stuff which is just plain wrong. You've been called out on it, and yet you can't say, "Uh, yeah, you're right, I was wrong."
So please, just stop, because you're embarrassing yourself. First rule of holes: when you find yourself in one, stop digging.
If you want to learn, then perhaps you should consider: less typing, more reading.
hi anon, as a fellow FPGA design engineer who just happen to pass by, I have to say, why are you wasting your time arguing with obvious amateur hobbyist. I'd stop after realizing he was complaining about JTAG and listing Atmel as FPGA vendor (what is this, the 1990s??).
FYI I worked for Xilinx, now works for a telecom equipment vendor as a FPGA engineeer. I know "a little bit" about FPGA.
I really hope you all had a new years day, things got a bit out of control last year - nothing a nice bottle nof champagne would not fix.
Now, back to the original post topic.
I do software for a living. Embedded software, usually even BSP and other weird stuff most programmers never heard about. Many things to consider, like multicore, multi depth caches, odd MMU/MPU systems, CPU/Coprocessor bugs, bad documentation, really odd accelerators, so on. Tough most of the time, but that is our job - to hide all of the HW peculiarities to higher level enginneers and programmers, so they can focus on what they do best: implement their algorithms.
And this is exactly where FPGA enters the show. Independently of implementing a softcore CPU on FPGA, or just have the FPGA implement accelerators (in a sense) and use external CPU, they do present an advantage over other, hardwired, architectures:
1) they are field reprogramable, so you can update them easily
2) they are relatively cheap
3) the design is faster than a CPU for specialized designs
4) they can be (some) partially reprogramable, and in runtime, which allows for efficient hw
aceleration of different tasks
5) they are cool (not thermally cool, unfortunately :) )
If you read this closely, you can see that there is no definite answer to FPGA usage. All depends on your requirements, and your need of hw design update.
I personally love them, but only advice their usage in specific scenarios.
I cannot say the same as hobby! I love them, and halfway my second CPU/SoC design. Why? Because i can, and because I enjoy doing so.
Side note for someone that said porting GNU tools is rather easy: do not believe them.
Good comments, Alvie, but I'm wondering about your subject line: "FPGAs are a commodity". According to Wikitionary, commodities are:
6. (marketing) Undifferentiated goods characterized by a low profit margin, as distinguished from branded products.
FPGA families each have their own strengths and quirks, making it hard to switch between vendors. The vendor tools are quite different, each with their own steep learning curves, which is another barrier for switching FPGA families. I don't know about the profit margins, but I suspect they're quite a bit higher than commodity MCUs.
In addition, FPGA design is still IMO regarded as a "black art", and many engineering organizations don't have in-house FPGA design capability. One one sense this is good for FPGA consulants, but it does mean that FPGAs are not designed in as often as they might be, which is bad for FPGA consultants. I sure hope they distribute magic wands and wizard hats at the upcoming EE Live! (formerly ESC) "FPGA boot camp".