What about the impact of ARM? How many ARM MPUs have direct PCIe interfaces? What happens if ARM vendors adopt SRIO? (Probably not likely, but could happen).
PCIe has achieved its success primarily because of Intel's dominance. If Intel loses its dominance (or, arguabley, has already lost outside of the traditional PC desktop/laptop world), then we could see changes.
The discussion in this thread seems to have migrated off topic, since the original question pertained to the viability of a PCIe fabric. But since this has turned into a discussion of the merits of SRIO, it makes sense to compare PCIe and SRIO for use as a general-purpose fabric. And that comparison comes out significantly in favor of PCIe.
For purposes of full disclosure here, I am an advocate of using PCIe as a rack-level fabric, and a PLX employee.
The major advantage of PCIe as a fabric is that almost every device -- CPUs, NICs, HBAs, HCAs, FPGAs, you name it -- has a PCIe port as at least one of its connection points, meaning you can eliminate all of the bridging devices necessary in other fabric types. This reduces power and cost, since you are eliminating a lot of bridges, and for that same reason it also reduces latency, so it seems obvious that this is the right approach.
There seems to be a series of assumptions here about how PCIe can be used as a fabric, and much of it is outdated. A PCIe-based fabric that connects directly to existing devices can be constructed, and it doesn't need to use non-transparency (which seems to be an implicit assumption in this thread). PLX is doing just that, and you can go to www.plxtech.com/expressfabric for a more in-depth explanation.
SRIO has the same drawbacks as Ethernet and InfiniBand in regards to needing additional bridges in most cases, since the devices that have a direct SRIO interface can be easily counted – and that's not many. And SRIO doesn't even have the advantage of the nearly universal usage that Ethernet has, or the incumbency for HPCs that InfiniBand has. So it has all of the disadvantages of the other alternatives, and none of their advantages.
This has nothing to do with the technical merits of SRIO at the SerDes or protocol level. It is a well-defined, high-performance interconnect. But it lost out to PCIe as the general-purpose connection back when it mattered, and the virtuous cycle that developed for PCIe made this a non-competition. You can argue about the speeds and feeds as much as you want, but as none other than the philosopher/actor Bill Murray once said, "It just doesn't matter."
Who wrote the article wrong the latency number, we have a memory to memory latency of about 750 nanoseconds ( including software latency).
The goal of the system is to use the memory mapping capability of the PCIe to the fabric to avoid any protocol encapsulation ( like Eth) to maintain the lower latency is possible and working bypassing the operating system kernel stack.
In x86 server there is no RIO native interface so also Rapid IO bridge must plug into a PCIe slot, also for the IB so in x86 server the minimum latency that you can experience it the rout complex latency. That is the reason because we extend the memory mapping of the PCIe.
RONNIEE Express is a low latency interconnection or data plane that use shared memory mapping for the communication between nodes extending the memory mapping used by the PCIe.
RONNIEE Express implement the support for a low latency 3D torus topology with flow control and traffic congestion management.
The most interesting thing of this interconnection is that permit to implement a TCP socket in memory that use the memory mapping for the communication bypassing the kernel and permit to use unmodified TCP application with a 1-2 microseconds latency with no modification.
This latency of our memory mapped socket is 10 time less that RapidIO RIONET so our memory approach is something that is really really powerful and it is not limited to the PCIe, but opens a new way to use distributed memory for communication
To understand better watch this video http://www.youtube.com/watch?v=YIGKks78Cq8
1. I would hesitate to answer your question without knowing more about A3Cube's claims. Not sure if the latency is lower becuase of better SW stack or an optimized silicon datapath.
I guess they feel there is a good PCIe interconnect market and the best PCIe implementation will get them business. This is not bad reasoning since to use IB or SRIO in an x86 system, you anyway add the extra PCIe latency. So the latency advantages of SRIO or IB get wiped out. So even if you are on par with SRIO oor IB, there probably is good business.
2. It is also not clear as to what portion of the packet flow the 100ns latency refers to, host controller or switch. Tough to analyse given the paucity of data.
But as I pointed out, latency is not the only issue when doing interconnects, the interconnect should support peer to peer topologies. PCIe does not natively do so and hence you pay the penalty in switch latency and silicon cost.
I had actually dsecribed the latency figures in detail in another post. But bottom line is that SRIO's 100ns latency is the best today. Nothing magical in that, keep the protocol simple, latency would be low. The KISS principle is applied well in SRIO.
PCIe is slightly worse. Having impemented both IPs, there frankly is not that much theoritical latency difference but SRIO being lighter will have less latency. Most of the latency actually gets swallowed in the PHY. Having said that switch latencies for PCIe are higher than I would have thought. Just see the public datasheets of PCIe switches from IDT and PLX and SRIO from IDT. I am not making it up.
Let me go thru the latencies of our published IP. 1 cycle for the logical and transport layer. That works out to .5 ns at 2 Ghz. Maybe 1 Ghz is more typical, so 1 ns. The rest is Digital + Analog PHY.
Public SERDES figures are in the 15 ns range. the PCS/PMA layer seems sub 5ns but we have not finished coding yet. CRC itself seems to be 3 cycles. So a 20-30 ns is a good target to aim for if attached directly to the bus. PCIe will be higher.
It is amazing that there is not a single detailed technical analysis at this level comparing the following PCIe, Ethernet, SRIO, IB, FC, Interlaken, QPI.If there was, we would not be having these discussions. Freescale by the way bought the SRIO IP from my prev company and Cray got HT from us too. So I have had to do these analyses for a while now. I do wish these discussions would come to this level so that we can sort out issues at a technical level instead of having to rely on marketing FUD from vendors and trade bodies.
I basically got pulled into all this for two reasons, I occasionally teach Comp. Arch at a master's level at one of India's premier tech universities and i had to select a standard interconnect for the India processor project and the supercomputer project. Settled on PCIe first but it simply did not cut it technically. So after 6 months decided to switch to SRIO.
After all the analysis what I realized was that all standards pretty much were at the same speed since they all had to use the same SERDES. Which basically is 10, 14, 25/28, 32 and 50/56. Latency varies depending on the protocol. Eth is the worst obviously. QPI probably is the best. I am trying to match QPI in our cache coherent interconnect which is built over SRIO's GSM functionality.
But as I pointed out, the issue with PCIe is its fundamentally flawed architecture model. Like our reptilian brain stem it cannot get rid of its PCI ancestry ! The attendent flaws are fixed in the genes. Remember Intel's aborted switch fabric standard over PCIe, ASI. We started on an IP on that too before quickly coming to the conclusion that it was a non-starter.
PCIe switch fabrics are like experiments with fully socialist governments. Every now and then somebody thinks it is a good idea and tries to have a go at it. Then quickly realizing the futility of that exercise, give it up. Till somone comes along a few years later ! Unless you area glutton for punishment, why on earth would you try doing a fabric using non-transparent bridging ?
Now the latency claims you are talking about I think are purely in the SW/driver domain. technically has nothing to do with the standard. But if all standards implemented optimal drivers, then the stadard's silcion latency would again be the determining factor. It is to alleviate this, that in our experimental processors we are linking the SRIO endpoint to the processor core (exactly the way Transputer did it eons ago, funny how things never change) . So a SRIO message is just a single instruction overhead. message will appear either on a special buffer or the cache in the remote cpu. That is the way to build a fast interconnect. You can bypass all the cache and MMU nonsesnse.
1. Site is misleading. spec as publsished today supports 10 and 25G. Some of the features optional for 10G (mainly error correction related) are mandatory for 25G. Since I am implementing a 25G solution today using Xilinx SERDES, I am pretty certain the specs supports 25G. The specs are free online, take a look. You will see PHY only for 10G since 25G SERDES for Ethernet is not final. Once that is final, SRIO will specify that too. But you can implement 802.3bm prelim if you want to go 4x25G optical now. Which is what I am doing and using zQSFP modules (Intel MXC is the other option)
2. I do not what you are referring to as shipping high speed interconnects.
Ethernet is only 10G per lane now. 40G is 4 x 10. 100G is 10x10 and a proposed 4x25.
SRIO uses the same SERDES technology as Ethernet, so by definition it will track Ethernet in terms of speeds.
PCIe is only 8G today, The proposed higher speed standard is not ready. How can you claim PCIe is shipping at speeds greater than 8G ?
Only Infiniband in the interconnect space is faster. Not incuding FC since it is irrelevant in thsi space).
Among other responsibilities, I am part of the official interconnect standards effort in India, so you can be rest assured I track these on a daily basis ! I also used to sell PCIe, SRIO and HT IP for a decade.
To sum up
Ethernet is currently only spec'ed at 10G per lane
PCIe is spec'ed only at 8G per lane
Infiniband EDR (the only finalized variant of Infiniband) is spec'ed at 25G, same as SRIO. HDR IB will ship only in 2017
If you think otherwise, please show me shipping Ethernet and PCIe parts that have speeds greater than 10G per lane and IB at more than 25 G per lane.
So of the lot only SRIO and IB are spec'ed at 25G. Granted SRIO is tracking IB by one year but that hardly makes it an antique interconnect.
By the way, there is nothing in the SRIO standard that limits it to 25G. The changes will come mainly in the PHY since error correction becomes a major issue. As you can see from the spec the encoding is conservatively spec'ed at 67/64 since 10Ge had problems with 66/64. Interlaken is similarly conservative.
"The 10xN specification, backward compatible with RapidIO Gen1 and Gen2 systems, supports 10.3125 Gbaud per serial lane"
The 25GB story is:
"RapidIO specifications are under development to support 25...."
Other interconnects are shipping 25Gbaud Now, and yes IP has been available for years, but never full cores for RapidIO (just thin Phy layers) complete cores are available for FPGA's for interconnects like PCIe and Ethernet.
You may have access to information about roadmaps for Gen 3 RapidIO parts, but the the reality is Infiniband, PCIe and Ethernet are shipping these speeds, and have been for sometime in volume.
In Niche applications the second sourcing may not be a issue, but in volume it is.
Freescale and TI support both PCIe and Ethernet. (and older sRIO) (Gen2)
Per the RapidIO Product showcase last Freescale product for sRIO was 2008