After I had submitted this story, it was announced that there is a new Arduino running a 1GHz TI Sitara. It would be interesting to run these two in a head to head to see which one actually came out on top.
The new Sitara-based Arduino will actually consist of an AM335x Sitara running Linux + a AVR8 for shields & such -- very similar in concept to the Udoo (except Udoo is using i.MX6 plus Arduino Due compatible SAM3X).
In fact, Udoo probably takes the crown for fastest Arduino, since you can get it with a quad core 1GHz i.MX6.
The Galileo board doesn't use a co-processor, so maybe it's the fastest Arduino board that doesn't use a co-processor.
With these dual chip Arduino compatible boards, I need to look into if the Arduino interface is being run on the Cortex A chip or the other lower performance 32 bit chip. The new Arduino board seems interesting in that you can create your sketches in Linux running on the Cortex A chip.
As I understand it, there were a few boards that were handed out. I tried contacting Intel to see if I could get my hands on one early, but it does not seem like that is currently possible. They will first be distributing them to universities, and then they will begin selling them to the public starting on Nov 29. The Intel rep told me that they would notify me if they decide to distribute any before their inital sale date.
Their datasheets are very different from the datasheets that I am used to looking at. You can tell that this device comes from noble ancestry, though, I did find it interesting that there was absolutly no reference to the term PWM at all in the datasheet. It was called a square wave output. Even at that the terms were very different from other datasheets. It will be interesting to see if Intel pushes further down into lower clock speed devices. If the Quark is sucessful, I could see them going after the Cortex M0+ market. I did also ask the rep if there were any partners that would be having any hardware come out soon based on the Quark core, but I did not get a response on that question.
As to the number of layers, I think that with BGA devices in this pitch, you can get the first two rows of pins out on the first layer, and then you add a layer for every extra row. I have not yet had the oportunity to do anything with a BGA device, though I am really itching to use the Freescale KL02 in a project. I think that I might be able to cheat the design guidlines enough at OSHpark to allow me to use it on a single layer.
I am not saying that you are incorrect, but I am wondering where you are getting your comparisons from. The Quark X1000 has been listed as a using the pentium instruction set. This takes it out of the 486 range, and I would imagine that it would quite handily beat a 486 architecture if there had even been one that was clocked up to 400MHz. It was not until the PII that there was a 400MHz processor in the Intel line.
I am not saying that this chip is really going to perform well, but I have yet to see any information that says it will perform poorly. Even though quite a bit of information has been given, there is still a lot more information that is not yet out on the performance of this little device. Things like number of processing cycles for math operations or DMIPS/MIPS per MHz. Once that data starts flowing out, then we can make actual comparisons. Right now very few people even have a board in hand, let alone have the knowledge to do performance tests. I much prefer to make judgements on data rather than subjective information.
So it is literally the 24 year old 486 with a 16KB shared I&D cache. In terms of clock cycles per instruction, yes the 486 is pretty slow compared to a modern RISC. Besides that, it is also single-issue while pretty much all Cortex-A cores can execute 2 instructions per cycle. And the 16KB shared I&D cache is not going to perform well compared with the 32KB I + 32KB D-cache of the Sitara (which also happens to have a 256KB L2).
For cycle timings see section 12.3 in the first link. Look at the multiply timings for example - up to 42 cycles for 32x32 multiply, and compare with the single-cycle multiplier in ARM cores. Shifts take 2-3 cycles while they are effectively free on most ARM cores. Floating point is not any better, with 10-16 cycles for fadd/fmul, while most ARMs do either in a single cycle. And back are all the AGU stalls, the penalties for complex addressing modes and LEA's, the various stalls for unaligned accesses, and our favorite: the 4-cycle fxch.
So yes while we don't have absolute figures, it is easy to understand that a 24 year old CISC CPU is going to be very slow compared to a modern RISC.
Your numbers on the math side add up, but the claim that it is a 486 do not. As I understand it the 486 never had out of order execution, which this device does (noted in section 1.2.2 of the second link). The other is that this device specifically handles the Pentium instruction set. It is noted in section 11 of the second document that you linked. If it is correct that this then is a Pentium class device, then as I understand it, that would put it at around 1.9 DMIPS/MHz (sources conflict on this number, but most of them were close to this number). This would compare favorably to the A7 and A8 devices.
Perhaps some of the peripherals that Intel would invite others to integrate into branded chips would be a MAC so as to reduce the math ops clock counts. That seems a little deeper integration than I would expect. This could also just be a first generation device. At 1.9DMIPS/MHz, that puts it in a competative range for performance, one would just have to see the power draw. From the TPD numbers that may be high as well, but I have a feeling that comes due to the support of the PCIe connections. I think that the core numbers are lower than that, but I have not received any information on this from Intel.
Sorry but you're clinging on to some Intel marketing words and not actually reading the manuals. Neither the 486 and Pentium are out-of-order - going out of order doesn't make any sense at all when you are aiming for small size and low power consumption.
Of course we'll have to run actual benchmarks on Quark to be sure, but the cycle counts and memory system simply do not look good on paper. Neither does the power consumption - for example the SAMA5D3 I mentioned uses just 200mW at max frequency with all perhiperhals enabled. That's on a 65nm process...
Well if it's not scanning for marketing terms then I don't understand how you could think it is an out-of-order Pentium (ie. a Pentium Pro!). For example the word "out-of-order" only occurs in the section about the memory controller. It's normal for a memory controller to reorder memory requests. But that doesn't make a CPU out-of-order!
Please do not misunderstand. I have other areas where I am disappointed in the Quark chip. I just do not see where you are pulling out of the datasheet itself that this is a 486 chip. This combined with statements from Intel that this is in the performance class of a Pentium make me probe statements to the contrary a bit more.
The things that make me disappointed about the chip are the fact that the peripherals are rather lackluster. They have few timers as well as timers that are missing a lot of features. The chip is also complex. Having 5 separate voltage requirements for an embedded chip is a bit ridiculous. Another area is the datasheet. To get data from the datasheet, you really have to dig compared to datasheets from other vendors. Lastly, there is little information about power consumption. This was something that Intel was touting, but all you find is a table that lists the max power supply requirements for all the different power rails. This is further complicated by the complex power states and no information on normal power consumption levels. Lastly, the fact that there is no mention of any sort of benchmark comparison in the datasheet or other supporting documents is also disappointing.
I'm guessing that Intel meant with "Pentium class performance" is that at 400MHz Quark achieves similar performance as the original Pentium did. That's a reasonable statement as 20 year old Pentiums were pretty slow, but it doesn't imply at all that Quark itself is a Pentium.
I agree about the other areas, it's a very complex CPU for what little it does.
I did take a look at the block diagrams, and to be honest it was all there roughly connected the same way. Some of the blocks have been physically shifted around to make it look prettier, but they were there. I started with the control unit and compared between the two what was connected to that. I was able to identify that each of the blocks was consistent, with slight nomenclature differences. Some showed more of a breakdown of intercomponents in each generic block. The quark also has the 64bit interunit transfer bus that is found on the pentium architecture. The main difference that I see is that there is a single 5 stage pipeline as opposed to two 5 stage pipelines.
The single vs double pipeline is obviously the key difference between a 486 and a Pentium. Also note that the 486 and Quark have a single cache, while the Pentium has split code and data caches. Both 486 and Quark use a 128-bit fetch bus and a 32-bit data bus, while Pentium uses a 256-bit fetch bus and a 64-bit data bus. Finally the Pentium has a branch target buffer, which neither the 486 nor Quark have. So it's 100% certain that Quark is a 486, not a slimmed down Pentium.
I can confirm that the MSRP for the Galileo is to be $60. This information comes from an Intel PR representative. Vendors may sell it for more or less than this, but the price makes this attractive to check it out and see if I might add Intel to the list of chips that I have worked with in the embedded world.
There are a couple of things that makes this device very unique. The first, and while it is nothing to do with performance, is that it is made by Intel. Intel may control the market for low volume, high margin processors, but as for overall devices shipped, Intel only has 2-3%. The other 90+% is in embedded controllers. Intel is starting to make a move for this space. This is a big strategic move. The impetus may to be able to increase the volume through the fabs to be able to get more of a return on their machines.
The next thing that makes this unique is that from what I can tell, it is being made on a 32nm process node. This is smaller than all others out there for this market segment. I believe that the smallest process node that is being applied to the Cortex M devices is 65nm (TI is producing their Tiva line at this process node). For the low end Cortex A devices, I think that these are on a 45nm process node. Obviously higher end Cortex A devices are at a smaller process node, but this is not the space that the Quark is competing in. This should give it an inherent power advantage. Intel has indicated that this device would easily transition to 22nm. This all means that the Quark will be able to provide advanced computations all while providing lower power than a competing device. It would enable high speed FFT and filtering calculations right at the sensor itself instead of having to offload that to post processing. This would further enable software defined radios and other complex devices while consuming little power.
As for the Galileo dev board itself, well now you can do some pretty advanced signal processing that was previously not available to other Arduino compatible boards. I would imagine that you should be able to do 1024point FFT calculations in under 1ms, perhaps even much faster than that in the .1ms range (this would depend on how well someone could write an optimized FFT routine in Arduino). Another thing, the board is compatible with Arduino, but it is also stated that it can be programmed with an open source version of C. I do not have too many details on this, but this to me is more exciting.
I took a look at the datasheet for the NXP4300, which is a ~200MHz Cortex M4F that also has a M0 as well. That device is consuming, with the M0 in reset and all peripherals turned off 81.5mA at 3v3. This is about 0.3W. There is no direct information on this from the Quark datasheet, only the measurements are max allowables that the circuitry can handle, not the consumed amounts. So I derive my numbers from the statements made at the IDF. It was stated that it was 1/5 the size and 1/10th the power of the Atom. The highest power consumption I could find for an Atom was 6W with all peripherals enabled. This was for a server, multi-core device. Even so, this would put the power consumtion around 0.6W. This number may get higher once you enable certain peripherals, especially the DDR# and PCIe. If you were to double the clock of the NXP4300, this would put it in the same power consumption range. I am guessing that my Quark estimates are conservative, so this would mean that the Quark chip is not all that bad. I will reach out to Intel to see if I can get some actual numbers and report back.
I don't agree that this device comeptes with the 65nm node. I think it competes with devices like the QCA4004(cpu+wifi on a chip,no flash, 40nm, quallcom) that are about to come out.
Other competitors are renesas with it's 40nm embedded flash process(which it want to license to other firms), and i think that there are other firms working on 40nm embedded flash, spansion among them(working togheter with UMC).
I believe they use 40nm because they are currently cheaper than 28nm. But maybe costs at intel are better? maybe they already paid for their 32nm fabs, and have got nothing better to do with them, so they can sell capacity cheaply?
Aero engineer, you also mention the quark made in 22nm: that is really interesting. One of the reasons many mcu's are done using 130nm-90nm is much higher(1000x) sleep current of newer processes. Intel's 22nm tri-gate process reduces sleep current by orders of magnitude, according to intel's claims, and should really help here. I wonder thought, isn't demonstrating this at 22nm would be much better sell ?
You bring up some interesting points. I think that Intel has yet to really clarify what their intent is with this device. I think that it will yet be a few years before it can really evolve into what they want it to become. The other issue that may come about is some of the peripherals commonly found in embedded controllers perform better at the larger process nodes. I am not too familiar with the techniques that are required for designing the chips themselves, so we will have to wait and see what ends up evolving from this.
As to the target market, this too is also yet to evolve. In looking over the datasheet, I can tell you that the power configurations were enough to make my head spin. It is laid out very differently from any imbedded device that I have seen. The highest performance device that I have used is a STM32F407. I referenced these devices because these sell at the $5-10 range. From what I have read, this is the target price range for the Quarks. All that information is unsubstantiated, so do not quote me on it.
Price is quite interesting, and quite good.
Similar chips on market with similar prices :
Ti sitara, $5, 400mhz, cortex-a8,internal mcu for real time stuff, few hundred of kb sram, plenty of peripherals including good analog.
But you at right, this is a long game. My guess is that in a few years, we will see similar hardware capabilities(for many embedded systems, the requirements are not that hard to achieve using good processes) and the fight will be on software and support.
And in a sense that's the more interesting part of the quark.
And if the move isn't quickly profitable for Intel, what will happen? Intel has been in the embedded business more times than I can remember, starting with the 8048 and 8051, with stops along the way with the 8096, the 80960, StrongARM, and embedded x86.
With companies like Microchip, Freescale, TI, Atmel, and Renesas, you know they are committed to the MCU market, and will have good long term availability. Intel and AMD have the opposite track record.
I very much would compare it to Cortext-A processors; it's probably more expensive than many, and has similar requirements (external DRAM and flash, etc), and I doubt it has much speed advantage, especially if you can use OpenCL on an ARM SoC's GPU.
Most Cortex-M* chips are MCUs with embedded flash and SRAM, and are available in QFP, not just BGA packages.
One of the interesting things is that most of those devices that you listed off have become rather successful. I was just reading that it was in 2007 that Intel stopped with the 8096 and its variants. Around the same time they sold off their ARM products line.
In this case, though there are some different market conditions that may cause Intel to attempt to compete in this market. With ARM trying to challenge Intel in the mid level processors, and thereby challenging them in the upper level processors due to competition for different device markets, Intel has a reason to try and notch a few wins in ARMs core market (sorry for the bad pun). This may drive Intel to be more competitive in this market as they can then amortize their fabs over a longer period and still have a lead in this space.
I believe that this board runs Linux. This implies that the Quark X1000 soc has an MMU....So its not a 'true' microcontroller but a microprocessor. Realistically it shouldn't be compared to the Cortex-M4 MCU parts. A more relevant comparison could be made with the Cortex-A5/8/9 parts.
I hope that this board can do more than run Arduino code. The Arduino API while very simple ,effective and easy to use for beginners in the embedded space, does not provide enough flexibility for the more experienced embedded programmers/hobbyists. It would be a real shame and a waste of sophisticated hardware if the only thing that this board can do is run Arduino sketches
The reason that I included the M4F is that it is slightly better in clock speed than that uC and slightly less than the A8 series. It does have many more features of the A8 series than the M4F. The question that I have, though, is if Intel plans on going further down the line into M0 territory, or if they plan to go slightly higher.
The board can indeed run more than Arduino code. There was mention in the many supporting documents that it could be programmed in GCC. I have not gone digging to see if I could find the version that supports the Quark. I would like to see some of the initial setup in C code. It seems that just the initialization could be a bit of a pain.