For more on Serial RapidIO, see The RapidIO High-Speed Interconnect: A Technical Overview, Using Serial RapidIO for FPGA co-processing, and Partitioning video across multiple DSPs and FPGAs.
Multi-processor DSP systems place tough demands on their I/O systems. The traditional high-performance I/O found in many DSPs has limitations when it comes to reliability, adequate bandwidth, and scalability. Serial RapidIO (sRIO) overcomes these limitations by providing a high-performance, packet-switched interconnect technology. Unlike its predecessors, sRIO does not require sharing the interface with memory, and it can work as either a master or slave. It also offers a long physical reach, hardware-level error detection/correction, status/acknowledgement feedback and in-band interrupt/signaling.
Advanced DSPs such as Texas Instruments Incorporated's TMS320C6455 DSP now incorporate sRIO interfaces. These interfaces are designed to be very efficient. The sRIO interfaces connect directly to the DMA engine in the DSP, using transaction proxy registers for low control overhead. Data can be prioritized for efficient handling by the DMA system, and the interface can queue multiple transactions.
sRIO in complex system topologies
First, it is important to understand the role sRIO plays in complex system topologies, and how it offers increased flexibility when implementing a physical system. sRIO provides chip-to-chip and board-to-board communications at performance levels scaling to 20 Gb/s and beyond. sRIO provides 1.25, 2.5, or 3.125 GHz bidirectional links in 1X and 4X widths, for throughput up to 10 Gb/s each way.
With sRIO, the designer can determine how to best connect multiple devices. DSPs can be connected directly in mesh, ring and star topologies. In addition, multiple DSPs can be connected through a switch with or without local connections to each other. Example system configurations include:
- A simple system with two DSPs connected via a 4X link.
- A more complex system consisting of five DSPs, each connected directly to the other via a 1X link (Figure 1).
- Five DSPs connected to a central switch via 4X links for better I/O bandwidth.
- A dozen DSPs connected via 4X links to a multi-switch fabric, delivering the ultimate in computational power and I/O bandwidth.
sRIO can also be used to connect DSPs, FPGAs and ASICs together. This flexibility allows designers to arrange the components in any way that suits the application data flow, rather than compromising system design to deal with interface or protocol limitations.
Figure 1. In this example, sRIO allows the flexibility to completely connect five DSPs.
sRIO enabled systems can achieve a significant overall performance increase by taking advantage of these features. Consider wireless base stations. In current designs, an FPGA or ASIC handles 24 to 48 antenna streams which consume a total of three to six Gbits/s. These data rates are far beyond the capabilities of traditional DSP I/O, but they can be easily handled by sRIO. Thus, sRIO makes it possible to replace the FPGA or ASIC with a far less expensive DSP.
User data, on the other hand, is typically processed on a DSP with approximately 19 Mbits/s per user channel. The increased core speed of the latest generation DSPs, faster sRIO I/O, and freed external memory bandwidth allow for increased channel density—up to 128 19 Mbits/s user channels per DSP, totaling of 2.5 Gbits/s of user data.
The sRIO interface is an efficient way to implement message passing between DSPs. For example, in the sRIO scheme, messages sent through the sRIO interconnect are sent at a higher priority than data buffers. This scheme is used because messages generally contain control data, which generally takes higher priority than data.
Software developers can develop sRIO-based applications using either low- or high-level message passing. In the low-level direct I/O approach, the programmer must specify the target processor and the address. This approach offers the best performance. It is appropriate for applications where the target buffering scheme is known at design time and the application partitioning is fixed. However, the downside of this approach is that the developer must know the physical memory maps of remote processors. This makes third party integration harder.
The high-level message-passing approach offers a more abstract way to communicate without doing a lot of low-level device programming. This approach is optimal for applications where the target buffering scheme is unknown, and the application partitioning is unknown or flexible. This approach greatly reduces the time required to scale an application, i.e., the time to incorporate a greater or smaller number of processors.