The interesting thing here is the sheer variety of Thunder options optimized for different workloads. Cavium seems to have pulled this out of their experience with Embedded parts. What I am confused about after reading this is how this will apply to servers.
Datacenter operators buy bulk servers for their fleet. They typically do not know apriori what workloads will run on them. Now they would have to chose the type of server for each workload?? One thing that Intel got right was to simplify the product offerings. Their problem was cost and power.
@Tarra! Tarra! I may not have been clear about this. There are four families of products under the Thunder brand. One is specifically targeted at servers. Others target storage and security appliances and networking.
"There are four families of products under the Thunder brand. One is specifically targeted at servers. Others target storage and security appliances and networking."
This is what does not make sense. Volume servers in the datacenter run all those applications today. E.g the same server could run map/reduce (hadoop) and also run generic web-tier applications. It appears that cavium is proposing to fragment the datacenter and have the operators chose between their different offerings and convert volume severs into appliances that can only run specific appliances?? That is a tall order.
Im making over $13k a month working part time. I kept hearing other people tell me how much money they can make online so I decided to look into it. Wel, it was all true and has totally changed my life. This is what I do,
➜➜➜➜➜➜➜➜➜➜➜➜➜➜➜ WWW.JOBS75.COM GO TO THE SITE AND CLICK NEXT TAB FOR MORE INFO AND HELP
Any indication from cavium on what the power of the 48 core device would be? the article mentions the cores as out of order. cavium has so far stayed away with in order simple designs for their cpus. is that a typo? If the core is out of order, then a 48 core thunder would be over 150W! How will it then compete with intel?
@Rick. Cavium's Octeon designs are not out-of-order but in-order. Thunder is likely to be the same. All other server CPUs - Xeons, Opterons, even X-Gene are fully out-of-order machines.
Actually XGene from appliedmicro is completely missing from your post. They showed a mini-datacenter running at Computex this week. Any reason, you are not covering them? They seem to be shipping already. From the specs they seem to have everything that thunder is claiming and a few years ahead.
@Servernut: Applied definitely got out there early. I have written 3-4 stories about them so far. I am not a Computex so would love to hear the latest. For a while they have been in Cavium's spot: we have been waiting for them to ship and report performance specs. Anyone have an update on that?
@Servernut: Going back to my notes I see Cavium left itself some quibble room, saying its core "supports optimized OOO."
Btw, Rick, what other points in your article are inaccurate "quibbles-room" from Cavium that you are merely repeating? Been reading your articles for sometime and you are usually good about sniffing out marketing FUD.
It may be that MIPS and Arm are geared for different markets, but given that Cavium is making basically the same computer chip with MIPS cores and with Arm cores, it would be interesting to see benchmark results from the two processors.
It would be a real apples-to-apples comparison. Which core is faster?
Now I'd like to hear some reality about where we are at and need to be at in server software for ARM if this borader initiative of which Cavium is just one part is going to get traction. Details, please!
1. It's a myth that ISA overhead is just in decode. There are many aspects of an ISA that affect the overall microarchitecture. Just to mention one example, x86 requires more load/store units due to having fewer registers and load+op instructions. x86 also uses a more complex memory ordering model.
2. Given they designed their own CPU it seems likely Cavium are aiming for better than Cortex-A57 performance, as otherwise they could have just licensed that (the same argument applies to X-Gene). A 3-way in-order is not completely implausible, but to get decent throughput it would need to be at least 2-way and ideally 4-way multithreaded.
4. If all else is equal, an identically performing x86 would use more power than ARM due to its more complex ISA. So the x86 ISA really is LESS efficient. Of course different processes, microarchitectures etc can mitigate this difference.
In any case there is no doubt a dedicated CPU can outperform a generic Xeon despite having a process disadvantage (as you say in point 5). Beating Xeon on single-threaded performance is much harder of course, but that is not something Cavium or X-Gene are attempting (at least with their current line-up). For many tasks, using more, slower cores is actually far more energy efficient.
Well it's obvious you've never looked in detail at the complexity of the x86 ISA. The overheads of x86 affect the whole microarchitecture. With an identical microarchitecture x86 would end up slower (and thus less power efficient). For x86 to achieve the same performance as a RISC, it needs a far more complex microarchitecture, increasing die size and power. You can compare die sizes for various ARM and x86 CPUs here: http://chip-architect.com/news/2013_core_sizes_768.jpg
The claim that x86 has a dense encoding is yet another myth. In fact the complex encoding means that x86 binaries are typically a little larger than ARM binaries, and significantly larger than Thumb-2. x64 is usually 15% larger than x86.
Yes I've read that paper and discussed it in detail on RWT. It is a badly written paper with most of the conclusions not supported by evidence. If you choose to compare wildly different and relatively ancient CPUs, an old compiler and completely ignore the memory system then of course the only possible conclusion is that microarchitecture matters the most! But that's only true if you make wild extrapolations and ignore or handwave at all other aspects. Let's hope this paper was a one-off mistake and doesn't reflect on the quality of papers coming from this university.
Note PPC is certainly not CISC. Neither is ARM or Thumb. PPC vs ARM is less interesting as their ISA features are nearly identical (not that there aren't differences but the differences tend to be insignificant details).
The debate on ISAs is interesting. I have designed x86 CPUs and other ISAs as well. It is a fact that x86 is inherently more complex than MIPS or ARM or PowerPC to varying degrees. There is certainly the CISC instruction decode penalty but there are other complex mechanisms that have been built into x86 over generations which still need to be supported by the latest x86 processors. All of these mechanisms take die-size and/or complexity. Almost every implementation of x86 CPU has a built in micro-code engine. This is like a programable engine within the CPU to handle these complex tasks. Intel has continued to stress floating point performance and each generation adds additional instructions adding transistors to the design.
So why is this relevant? This "overhead" becomes smaller in very high performance implementations - out-of-order, multi-threaded, large cache designs. Here the overhead can be amortized over the performance gains of a complex CPU. This is why Intel has competed well at the very high end compute but failed in low power efficient designs that are required for mobile.
In these less complex implementations where the CPU has fewer transistors, this overhead starts to make a difference. This is why the mobile processors from Intel and even the Atom cores have not competed so well.