1. I actually had discussed the core and interconnect power seperately. My position is that Intel core and ARM cores are equivalent in efficiency at identical complexity. You can check out the Wisconsin paper which shows that the x86 decode penalty is no longer significant. That said, interconnect power is significant in large core counts. Can approach a total of 40% as per a lot of research. Most CPU designers these days focus on that, including me.
2. You seem to be confused about higher layer functions and interconnect topologies. Functionally buses, crossbars, rings and NoCs are identical. All can be cache coherent. NoCs are NOT anything by definition. You can run the AXI CC protocol over NoC. It all depends on your SoC config. For a large no of heterogenous blocks, NoC is the best and hence its prevalence in mobile. Intel wants power efficiency with mostly homogenous blocks, hence Ring.
NoCs can very well be made coherenet. Latency tends to be higher so it is used for CC traffic only at high core counts. The bus enhanced NoC arch. is a good compromise.
Disagree that interconnect power is at a scale that compares with CPU power. CPUs are very much the highest power consuming circuits on the die by a huge margin. The interconnect discussions of NOC vs. Ring again is confusing things. NOCs are used to connect to peripheral interfaces and IP. NOCs are not coherent by definition. The Rings that Intel uses are for the coherent CPU interconnect.
Reducing the CPU power has the biggest bang for the buck while still maintaining performance. Adding CPUs to the die increases the power linearly but the interconnect power increase is less than linear.
You can optimize an NoC but cannot make it more optimal than a ring.
You alluded to it in your post, rings are static and hence more efficient.
The dynamic routing logic is the power hog.
The two NoC companies survive because of the mobile world. Mobile SoCs use a lot of IP and NoCs let you change the topology and IP
Configuration easily. That is primarily why they are popular in mobile SoCs. Not because of efficiency.
Had a chat yesterday with a couple of guys who used to do the omap and sparc designs just to make sure ! I am currently designing a family of processors. Using NoCs for mobile variants and plan to use crossbar/hybrid NoCs for the server class config. Will know better after a few months of simulations. If the univ. Of michigan swizzle configs works, crossbars are the way to go for homogenous designs. Verification is easy too.
"the point is that pie is already divided and the ARM is going after a piece that is already covered by Intel, which is already on its 2nd-generation Atom microserver chip before ARM is even out of the gate."
That's an interesting rewrite of history... Calxeda has had its ARM servers out for well over a year now, and that was before Intel even announced Centerton, let alone shipped it! Note Calxeda has its 2nd generation out as well.
I also don't agree that the x86 penalty is low - if that were true then why is AMD having such a hard time keeping up with Intel while a dozen of small outfits can design fast and efficient ARM cores which are challenging Intel? Even Intel took a very long time to come up with an Atom replacement, and it ended up being a simple 2-way core (as 3/4-way is too power hungry on x86).