To effectively consolidate networked attached storage (NAS), SANs and other emerging storage solutions into a single storage infrastructure, will require an intelligent storage networking switch that includes data path processing performance in the tens of Gigabits/second.
Achieving these performance levels for data path processing functions is no easy task. The array of possible solutions, including ASICs, processors, and programmable logic, offers various tradeoffs in the areas of flexibility, cost, development time, and ease of design.
Custom ASICs have the best chance of meeting the performance needs for data path processing in storage applications. In addition to implementing high-speed digital functions, ASICs can also meet the demand for I/O transport interfaces such as Fibre Channel, InfiniBand, and iSCSI, and I/O physical chip interface technologies such as memory interfaces and SP1-4. And, they provide the lower per-piece component cost.
However, ASICs suffer from long development times and high cost of entry, including tools costs, non-recurring engineering (NRE) charges, and commitments to purchase high volumes. They are also inflexible, and cannot be modified to meet changing needs without being designed to do so at the onset of initial development a very difficult and costly proposition. It is this lack of flexibility combined with the long development cycle for ASICs that hinders their usefulness for many applications, since they cannot be changed "on the fly" to respond to the rapidly evolving standards that are likely to be used.
Network processors are another choice to perform data path processing. They are very flexible in their processing operations, and have much shorter development cycles than ASICs, since they can be used off-the-shelf which meets today's evolving standards requirements. The disadvantages: They are lower performance and higher cost than custom ASICs, and they are inflexible in the I/O interfaces that they support. Also, the programmer must have detailed architectural knowledge to produce fully optimized and individualized code for the micro engines.
Programmable logic devices (PLDs) also offer an option for data path processing. Although they are not as fast as custom ASICs, they provide flexibility similar to network processors in terms of processing capability. And, they support both evolving and current leading-edge interface standards.
Development times are much shorter for PLDs than for custom ASICs: They are available off-the-shelf, and they do not require NREs. Developers of leading-edge storage networking products are increasingly relying on PLDs to implement their data path processing functions. The use of PLDs does not rule out the use of other hardware options; it is possible to use some combination of the three (ASICs, network processors, and PLDs), but generally a programmable logic component is required to address the need for flexibility. PLDs can also supplement systems that use network processors with co-processing functions in the event that the network processors cannot meet the sustained network performance requirements for the product.
In packet-oriented storage systems, data path processing functions generally consist of packet- handling operations, including header inspection and analysis, checking for packet integrity, packet forwarding, and payload processing. The control and traffic management functions for handling the flows of packets through the system are also good candidates for implementing in PLDs. In order to reside directly in the data path of these applications, these PLDs require support for high-speed interfaces. This support consists of both the logical requirements for the emerging standards as well as the I/O technology itself, such as low-voltage differential signaling (LVDS) for source-synchronous interfaces or clock-data recovery (CDR) technology for serial interfaces such as Fibre Channel and InfiniBand as well as support for high-speed SDRAM and SRAM interfaces such as DDR-SDRAM and QDR-SRAM.
PLDs equipped with high-speed transceiver functions can interface to the backplane and perform traffic management functions. Embedded processor solutions for PLD like Excalibur can take on the packet processing tasks.
One pivotal role for PLDs is in the interface between the backplane in a storage switch. PLDs are now available that feature built-in clock-data recovery (CDR) and serializers/deserializers (SERDES), reducing component count by eliminating the need for discrete devices to perform these functions.
The built-in transceiver functions also increase system performance by reducing chip-to-chip delays. Generous amounts of on-chip RAM can aid in managing the queuing of packets. For example, Altera Stratix devices feature MegaRAM blocks of 512 Kbytes of memory, which are well suited to this task. External memory may still be required, and to support this, the PLD must be able to interface with high-speed SDRAMs and SRAMs. The traffic management PLD performs intelligent DMA functions, which can be managed by an on-board controller, such as the Nios embedded processor.
Data integrity is very important in storage applications, and PLDs can aid in this capacity as well. In the case of our Pirus PSX-1000 Storage Utility Switch, APEX 20KE PLDs are used to perform high-speed checksums in each node of the system, which are then appended to each packet.
The upper layers use the results of this constant check summing, resulting in "end-to-end" packet integrity. To meet even higher levels of data integrity, it is possible to modify the APEX 20KE PLD design to perform different checksums algorithms or proprietary integrity functions to meet the individualized needs of specific end users.
Generating and checking these checksums at the best possible speed is critical to ensuring a high throughput; systems developers can apply PLD logic resources in parallel processing fashion to achieve their desired performance, unlike the sequential operation that limits processors in this application.
The packet pre-processing begins in the traffic management PLD, where header information can be stripped and stored in the high-speed memories before being routed to the device that actually performs the packet processing. This function can be effectively handled by PLDs because the amount of header information needed varies from protocol to protocol, and changes to emerging protocols such as iSCSI can force design modifications. The actual header processing is generally performed by a processor, and with the advent of devices that integrate processors and PLDs such as Altera's Excalibur family, all header processing can be consolidated into fewer devices - which lowers cost, power, and board real estate.
Packet processing, including header and payload processing as well as packet forwarding, can be achieved with a combination of PLDs and processors, or by hardware that combines PLDs and processors. Header processing typically includes implementing OSI Layer 1-2 functions such as header inspection and analysis, packet integrity and packet forwarding.
Packets are formatted and forwarded as a result of header inspection. Packet forwarding includes insertion of destination address, frame construction and forwarding to the correct output port. In the case of the PSX-1000, we used PLDs to achieve a "data mirror" packet-forwarding feature that allows data to be written simultaneously to multiple locations. Called "multicasting", this feature avoids the sequential copying of data which would have been required if it were implemented by a processor. Instead, the parallel operation performed by PLDs is performed much faster and eliminates a potential bottleneck.
Integrating both header processing and payload processing functions into a single programmable device simplifies the design and eliminates chip-to-chip delays. Altera's engineers were able to integrate Layer 2 functions along with some Layer 3 and 4 functions in a single PLD device, including TCP/IP checksum and next-hop address determination. By offloading these functions from a processor, Pirus' team was able to increase the performance of their system and meet their speed targets.
See related chart