It is hard to comment on A3Cube's latency claims, since their implementation is secret.
There's one latency and throughput number that I don't see being mentoined here. If you are transmitting small (<64 byte) frames, then the packet overhead will add significantly to the latency and throughput, and the fact is that PCIe packets have a bigger overhead than RapidIO packets. For small packets, there's no question which one is faster.
My post on PCIe vs SRIO (or others) was not comparing the business side to the technical – I was explaining that PCIe was technically the best solution based on key metrics. The fact is that PCIe provides a universal connection – this is not really open to reasonable debate – and that this offers advantages – technical advantages – that no other interconnect can offer.
PCIe provides the largest universe of components with direct connection, which offers a clear advantage in power, cost, and latency, and this leads to a potential performance edge as well. And it allows the use of most devices – since they almost all have PCIe as a connection – so that you can build up a system that is tuned to your need with off-the-shelf components. These are not business advantages, but technical advantages. And it is these technical advantages that lead to the business success, not the other way around.
I would encourage people to go to the page that I noted before (www.plxtech.com/expressfabric) to see that the PCIe solution that I was referencing goes beyond a simple PCIe network. It offers both DMA and RDMA, and in each case they are compatible with the vast number of applications that have been written for Ethernet and InfiniBand. You benefit from the advantages I mentioned, and you can use the same applications.
And it allows sharing of I/O devices among multiple hosts with existing devices and drivers. These features can all be had with a single, converged fabric.
I can think of two ARM SoCs with PCIe off the top of my head: Freescale i.MX6 and Xilinx Zynq. There are probably others, but most ARM SoCs are designed for mobile devices so they don't need high-speed wired connectivity. As more ARMs get designed for servers, I bet you'll see plenty of PCIe.
PCIe is quite common in the PowerPC space, such as Freescale PowerQUICC 3 and QorIQ, and AMCC (now Applied Micro).
What about the impact of ARM? How many ARM MPUs have direct PCIe interfaces? What happens if ARM vendors adopt SRIO? (Probably not likely, but could happen).
PCIe has achieved its success primarily because of Intel's dominance. If Intel loses its dominance (or, arguabley, has already lost outside of the traditional PC desktop/laptop world), then we could see changes.
The discussion in this thread seems to have migrated off topic, since the original question pertained to the viability of a PCIe fabric. But since this has turned into a discussion of the merits of SRIO, it makes sense to compare PCIe and SRIO for use as a general-purpose fabric. And that comparison comes out significantly in favor of PCIe.
For purposes of full disclosure here, I am an advocate of using PCIe as a rack-level fabric, and a PLX employee.
The major advantage of PCIe as a fabric is that almost every device -- CPUs, NICs, HBAs, HCAs, FPGAs, you name it -- has a PCIe port as at least one of its connection points, meaning you can eliminate all of the bridging devices necessary in other fabric types. This reduces power and cost, since you are eliminating a lot of bridges, and for that same reason it also reduces latency, so it seems obvious that this is the right approach.
There seems to be a series of assumptions here about how PCIe can be used as a fabric, and much of it is outdated. A PCIe-based fabric that connects directly to existing devices can be constructed, and it doesn't need to use non-transparency (which seems to be an implicit assumption in this thread). PLX is doing just that, and you can go to www.plxtech.com/expressfabric for a more in-depth explanation.
SRIO has the same drawbacks as Ethernet and InfiniBand in regards to needing additional bridges in most cases, since the devices that have a direct SRIO interface can be easily counted – and that's not many. And SRIO doesn't even have the advantage of the nearly universal usage that Ethernet has, or the incumbency for HPCs that InfiniBand has. So it has all of the disadvantages of the other alternatives, and none of their advantages.
This has nothing to do with the technical merits of SRIO at the SerDes or protocol level. It is a well-defined, high-performance interconnect. But it lost out to PCIe as the general-purpose connection back when it mattered, and the virtuous cycle that developed for PCIe made this a non-competition. You can argue about the speeds and feeds as much as you want, but as none other than the philosopher/actor Bill Murray once said, "It just doesn't matter."
Who wrote the article wrong the latency number, we have a memory to memory latency of about 750 nanoseconds ( including software latency).
The goal of the system is to use the memory mapping capability of the PCIe to the fabric to avoid any protocol encapsulation ( like Eth) to maintain the lower latency is possible and working bypassing the operating system kernel stack.
In x86 server there is no RIO native interface so also Rapid IO bridge must plug into a PCIe slot, also for the IB so in x86 server the minimum latency that you can experience it the rout complex latency. That is the reason because we extend the memory mapping of the PCIe.
RONNIEE Express is a low latency interconnection or data plane that use shared memory mapping for the communication between nodes extending the memory mapping used by the PCIe.
RONNIEE Express implement the support for a low latency 3D torus topology with flow control and traffic congestion management.
The most interesting thing of this interconnection is that permit to implement a TCP socket in memory that use the memory mapping for the communication bypassing the kernel and permit to use unmodified TCP application with a 1-2 microseconds latency with no modification.
This latency of our memory mapped socket is 10 time less that RapidIO RIONET so our memory approach is something that is really really powerful and it is not limited to the PCIe, but opens a new way to use distributed memory for communication
To understand better watch this video http://www.youtube.com/watch?v=YIGKks78Cq8