As kids we were taught that sharing is good. The semiconductor industry seems to have forgotten the spirit of that lesson, but one technology that reminds us of what our parents taught us is PCI Express (PCIe). Multiple vendors have tried to use this ubiquitous interconnecting technology to enable the sharing of I/O endpoints and, therefore, lowering system costs, power requirements, maintenance, and upgrading needs. PCIe-based sharing of I/O endpoints is expected to make a huge difference in the multi-billion dollar datacenter market.
Traditional systems currently being deployed in volume have several interconnect technologies that need to be supported.
As Figure 1 shows, InfiniBand, Fibre Channel and Ethernet are a few examples of these interconnects.
Figure 1: Example of a traditional I/O system in use today
This architecture has several limitations, including:
- Existence of multiple I/O interconnect technologies
- Low utilization rates of I/O endpoints
- High power and cost of the system due to the need for multiple I/O endpoints
- I/O is fixed at the time of architecture and build… no flexibility to change later
- Management software must handle multiple I/O protocols with overhead
The architecture is completely disadvantaged by the fact that multiple I/O interconnect technologies are in use, thereby increasing latency, cost, board space, and power.
The architecture would at least be partially useful if all the endpoints are being used 100%.
However, more often than not, they are under-utilized.
Customers pay the entire overhead for a limited use of the endpoints.
The increased latency is because the PCIe interface native in the processors on these systems needs to be converted to multiple protocols.
Designers can reduce their system latency by using the PCIe that is native on the processors and converge all endpoints using PCIe.
Clearly, sharing I/O endpoints (See Figure 2) is the solution to these limitations.
This concept appeals to system makers because it lowers cost and power, improves performance and utilization, and simplifies design.
With so many advantages, it is no surprise that many companies have tried to achieve this; the PCI-SIG, in fact, published the Multi-Root I/O Virtualization (MR-IOV) specification to achieve this goal.
However, due to a combination of technical and business factors, MR-IOV as a specification hasn’t really taken off, even though it has been more than five years since it was released.
Figure 2: Example of a traditional I/O system with shared I/O
Additional advantages of shared I/O are:
- As I/O speeds increase, the only additional investment needed is to change the I/O adapter cards. In earlier deployments, when multiple I/O technologies existed on the same card, designers would have to re-design the entire system, whereas in the shared-I/O model, they can simply replace an existing card with a new one when an upgrade is needed for one particular I/O technology.
- Since multiple I/O endpoints don’t need to exist on the same cards, designers can either manufacture smaller cards to further reduce cost and power, or choose to retain the existing form factor and differentiate their products by adding multiple CPUs, memory and/or other endpoints in the space saved by eliminating multiple I/O endpoints from the card.
- Designers can reduce the number of cables that crisscross a system. With multiple interconnect technologies comes the need for different (and multiple) cables to enable bandwidth and overhead protocol. However, with the simplification of the design and the range of I/O interconnect technologies, the number of cables needed for proper functioning of the system are also reduced, thereby eliminating the complexity of the design, in addition to delivering cost savings.
Implementing shared I/O in a PCIe switch is the key enabler to architectures depicted in Figure 2.
As mentioned earlier, MR-IOV technology hasn’t quite taken off and a prevailing opinion is that it probably never will.
To the rescue comes Single-Root I/O Virtualization (SR-IOV) technology, which implements I/O virtualization in the hardware for improved performance, and makes use of hardware-based security and Quality of Service (QoS) features in a single physical server.
SR-IOV also allows the sharing of an I/O device by multiple guest OS’s running on the same server.
In 2007, the PCI-SIG released the SR-IOV specification that enables one physical PCIe device (NIC/HBA/HCA) to be divided into multiple virtual functions.
Each virtual function can then be used by a virtual machine, allowing one physical device to be shared by many virtual machines and their guest OSes.
This requires I/O vendors to develop devices that support SR-IOV and is the simplest approach to sharing resources or I/O devices among different applications. The trend has been that most of the endpoint vendors are supporting SR-IOV and many more will continue to support this requirement.
Adding to its many advantages referenced here, PCIe is also a lossless fabric at the transport layer.
The PCIe specification has defined a robust flow-control mechanism, which prevents packets from being dropped.
Every PCIe packet is acknowledged at every hop, insuring a successful transmission.
In the event of a transmission error, the packet is replayed again – something that occurs in hardware, without any involvement of upper layers.
Data loss and corruption in PCIe-based storage systems, therefore, are highly unlikely.
PCIe offers a simplified solution by allowing all I/O adapters (10GbE or FC or others
) to be moved outside the server.
With a PCIe switch fabric providing virtualization support, each adapter can be shared across multiple servers and at the same time provide each server with a logical adapter.
The servers (or the VMs on each server) continue to have direct access to their own set of hardware resources on the shared adapter.
The resulting virtualization allows for better scalability wherein the I/O and the servers can be scaled independently of each other.
I/O virtualization avoids over-provisioning the servers or the I/O resources, thus leading to cost and power reduction.
The newest PCIe Gen3 incarnation supports up to 8Gbps per lane, and its use in high-performance embedded systems has increased dramatically.
A PCIe Gen3 switch with 96 lanes, such as the PLX ExpressLane PEX8796 for example, can support up to 1.5Tbps of bi-directional data transfers.
Designers can scale using multi-stage fabrics and enable various new topologies.
With multiple choices for link width (x4, x8 and x16), PCIe provides more flexibility for designers to configure their systems depending on actual usage.
Such PCIe switches also need to support advanced features such as non-transparency (NT), Direct Memory Access (DMA), spread spectrum clock (SSC) isolation, link-layer and end-to-end cyclic redundancy check, lossless fabric and congestion management.
With PCIe becoming native on more-and-more processors from major vendors, designers can benefit from the lower latency realized by not having to use any components between a CPU and a PCIe switch.
With this new generation of CPUs, designers can place a PCIe switch directly off the CPU, thereby reducing latency and component cost.
PCIe technology has become ubiquitous, the Gen3 incarnation of this powerful interconnect technology is more than capable of supporting shared I/O and clustering, providing system designers with an unparalleled tool to make their designs optimally efficient.
To satisfy the requirements in the shared-IO and clustering market segments, vendors such as PLX Technology are bringing to market high-performance, flexible, and power- and space-efficient devices.
These switches have been architected to fit into the full range of applications cited above.
Looking forward, PCIe Gen4, with speeds of up to 16Gbps per link, will only help accelerate and expand the adoption of PCIe technology into newer market segments, while making it easier and economical to design and use.
About the Author
Krishna Mallampati is a product marketing director for PCIe switches at PLX Technology, Sunnyvale, Calif. He can be reached at firstname.lastname@example.org.