With OC-192 systems hitting the street, members of the design community are now laying the groundwork for the development of OC-768 and other 40-Gbps optical networking systems. Like their 10-Gb counterparts, 40-Gb system designers have made key interface decisions.
As with the 2.5- and 10-Gb markets, the Optical Internetworking Forum (OIF; www.oiforum.com) is again stepping up to address important interface issues. Specifically, the forum is creating a set of interface specs that are designed to accelerate data movement and eliminate system bottlenecks.
System Packet Interface Level 5 (SPI-5) is OIF's first venture in the 40-Gb arena. This recently ratified spec takes on one of the biggest I/O challenges--the connection between the physical layer (PHY) and link layer devices.
While SPI-5 had been under development for over a year, it is now only starting to hit the market. To help speed implementation, here's a tutorial that provides you with insights into the inner workings of the spec and the impact it will have on optical networking architectures.
SPI-5 transfers packets from multiple channels by interleaving fragments called bursts, from each channel, over the datapath. The flow of data from these channels is regulated by an out-of-band reverse flow control interface.
As can be seen in Figure 1, the SPI-5 interface is symmetric, defining the same signals and behavior in both the transmit (link layer to PHY) and receive (PHY to link layer) directions.
Click here for Figure 1
Figure 1: Block diagram showing the SPI-5 interface in both the ingress and egress paths. A, B, C and D are reference points used in the companion SxI-5 electrical I/O spec (to be ratified by OIF soon).
On the datapath, the interface signals consist of a 16 parallel data lanes (TDAT , RDAT) together with a clock (TDCLK, RDCLK) and a control signal (TCTL, RCTL). In order to control transition density, the control and data signals are independently scrambled by an X11 + X9 + 1 linear feedback shift register (LFSR) stream cipher.
The clock is source-synchronous (originating together with the data at the source end) and runs at one quarter the baud rate of the data. In-band signaling is used on the data path, in which control signals may be inserted between data transfers. The control line is used to distinguish between periods of data and control information; it is low during data transfer and high otherwise. In-band control information includes the means for indicating start-of-packet (SOP), end-of-packet (EOP), the destination port address of the transfer, and an error-detection code.
The SPI-5 interface can be operated at data rates from 2.5 to 3.125 Gbps to transport various payloads, including ATM, packet over SONET/SDH, (POS) and Ethernet frames.
Flow control information is sent out-of-band in a serial status line (TSTAT, RSTAT) that runs at the same bit rate as the datapath. These status lines are also scrambled with the same scheme employed in the datapath.
Out-of-band flow control has the advantage of keeping the transmit and receive interfaces independent of each other. If in-band flow control were used, then the flow control for one datapath would be carried in the data path in the other direction. This would be of no major consequence if both transmit and receive functions could be integrated in one link layer device.
With the present state of IC technology, however, it is not possible to achieve any meaningful integration of both functions into one device. Hence, most link layer implementations will consist of separate, unidirectional devices. Consequently, an in-band flow control scheme would have required yet another interface between those two devices to relay flow control information onto the opposite data path. So while it might appear "cleaner" to have all control information sent in band, practical realities demand that flow control be sent out-of-band.
In SPI-5, data is transferred in uninterrupted bursts that terminate after a multiple of 32 bytes has been sent or upon end-of-packet. There is no restriction or dependence on the data format or protocol of the packets being sent over the interface. Different packet formats may even be interleaved. For example, a transfer containing a POS segment can be sent to one port, followed by an ATM cell to another, and an Ethernet frame to yet another. Burst transfers that comprise a given packet need not have the same length.
Control words are inserted between transfers. Idle control words are inserted when there is no data to send. A payload control word must be inserted before the start of each data transfer. Back-to-back data transfers are therefore separated only by insertion of a payload control word.
Both idle and payload control words share a similar 16-b format, consisting of a 2-b type field, a 2-b EOP status field, 8 bits of port address, and a 4-b diagonal interleaved parity (DIP)-4 error detection code. The DIP-4 code covers the control word and any of the data sent after the prior control word; it is an odd-parity code that is computed "diagonally" across the 16 data lanes, as shown in Figure 2. Note that the control line is not protected by the DIP-4 code. Spurious logic high levels in the control line will, however, trigger a DIP-4 error check as if a control word were actually inserted and generally flag a parity error.
Click here for Figure 2
Figure 2: The DIP-4 code is computed diagonally across 16 data lines.
Diagonal parity does not offer better performance over conventional parity in the presence of random errors. However, it allows errors isolated in one data lane (resulting for example, from a marginal PC board trace or defective I/O) to be spread across more than one parity bit, thereby increasing the likelihood of detecting those errors.
The EOP status field indicates four possible states: not an EOP, EOP with two bytes in the last transfer valid, EOP with one byte valid (the packet has an odd number of bytes), and an EOP with an Abort condition (triggered by error conditions that are application-specific). Two values in the type field are reserved for the payload control word (one to indicate SOP, and another to indicate continuation of a packet transfer).
The 8-b port address allows up to 256 ports to be supported. If needed, SPI-5 has an option for supporting a much larger number of ports by extending the number of address bits to support a larger address space.
To carry the additional address bits, an address control word (ACW) followed by zero or more address data words (ADW) may be inserted before the payload control word (PCW). The ACW has a format similar to a PCW and can carry 8 address bits.
Address data words each contain 16 bits of address. The entire address is transferred as follows: the least significant byte is carried in the ACW. If needed, the next two more significant bytes are carried in an ADW. Additional ADWs may be inserted to transfer the next set of more significant bytes. Finally, the PCW is sent prior to data transfer, containing the most significant byte.
The total number of address bits may be up to 18 B long. Figure 3 shows where all these control words fit together for a given burst transfer.
Click here for Figure 3
Figure 3: Diagram of an SPI-5 burst transfer.
SPI-5 supports a simple hierarchy of addresses, as shown in Figure 4. The entire address field can be conceptually divided into an extended address and a physical address.
Click here for Figure 4
Figure 4: The SPI-5 interface supports a hierarchy of addresses, including extended and physical addresses.
The physical address corresponds to unique ports within the sink device. The configured length of the physical address is contained in a device parameter called PADDR_LEN.
The subset of physical addresses that is subject to flow control under SPI-5 is referred to as a pool, and its extent in the address hierarchy is given by the device parameter POOL_LEN. By defining a pool, it is possible to simplify flow control by restricting it to a smaller address space in applications where it may not be required to use flow control over the entire physical address space. For example, physical ports sharing the same block of memory may have a common pool address. Usage and interpretation of the extended address is application-specific. For example, the extended address could contain connection identifiers sharing the same physical port (such as PPP connections statistically multiplexed onto an STS-3c port in a 256 x STS-3 application).
Handling Short Bursts
Under the SPI-5 specification, burst transfers can be of any length, and may even be shorter than 32 bytes. To ease the implementer's task of dealing with a large number of extremely short back-to-back bursts, a mechanism to rate limit the arrival of data bursts is provided.
Even though long streams of short bursts are unlikely to happen, designers typically engineer for the worst valid case permitted by a given interface. To remove this difficulty, SPI-5 defines a burst admission procedure (BAP), which provides a means for controlling the flow of short transfers over the interface.
BAP uses a token bucket algorithm, in which a token is generated at a fixed rate up to a finite maximum amount, and is consumed by each block of 16 or fewer (16-b) payload data words or each block of 8 or fewer (16-b) address data words. A token must be available for the payload data (and another for the address data, if implemented) in order for a transfer to proceed.
Transfers shorter than 16 data words will still each consume a token nevertheless. With the appropriate parameters selected for the token bucket algorithm, BAP operates such that long transfers are not throttled back, but that long stretches of short bursts will eventually consume all the tokens and force the source side of the data path to pause momentarily.
BAP is implemented only on the source side. It therefore does not require any handshake mechanism from sink end of the datapath.
Aligning the Lanes
Since SPI-5 is a high data rate interface, correcting for skew between different I/O paths is a key concern. To solve this problem the specification includes a mechanism that de-skews and aligns multiple parallel lanes of the datapath.
Under the SPI-5 spec, parallel lanes are aligned by means of a predefined pattern (called a training sequence) that is sent on the bus. The training sequence consists of one or more training patterns, each of which in turn consists of 16 training control words followed by 16 training data words.
The training control and training data words are designed to be completely orthogonal to each other. Specifically, any given bit position in the training control word is the complement of that in the training data word. To the sink end of the interface, the training pattern appears as a long square wave (16-baud periods high, 16 low) on each of the 16 data lanes and the accompanying control signal.
At the boundary between the training control and training data words, all 1s change to 0, and all 0s change to 1 simultaneously in the absence of skew. By measuring the differences in transition time, the receiver can determine and compensate for the delay variations among the data lanes and control signal.
A training sequence must be scheduled to be sent at least once after a configured number of cycles has elapsed after the last sequence was sent. In practice, the maximum interval between training sequences is very large and has minimal impact to bus efficiency. Training sequences may also be sent in lieu of idle control words at the discretion of the source side of the interface.
Credit-based Flow Control
As specified in SPI-5, the device at the sink end of the datapath can limit the flow of data from the source end by exercising flow control in the reverse direction. Each data path has a corresponding status channel for sending flow control information. SPI-5 uses a credit-based scheme, in which credits are granted and consumed on a per-pool basis.
There may be a one-to-one correspondence between pools and the constituent ports in a networking device. Where sets of ports share common resources (buffer memory for example), it may be sufficient to group multiple ports into a pool and collectively exert flow control upon them as one entity.
The datapath sink grants credits on the basis of its ability to receive more data without overflowing its buffers. Credits are consumed by the datapath source whenever payload data or address data words are sent. A credit is consumed for each 32-B block of data transferred. Blocks shorter than 32 B will also consume a credit nevertheless.
Flow control information for the credit pools is sent in a sequence of 2-b status words. The corresponding pools in the sequence are defined by a calendar, which is configured on both sides of the interface. It is possible for a pool to occupy more than one position in the calendar. For example, the number of positions occupied by a given pool may depend on the corresponding data path bandwidth represented by the pool.
The status word for each pool conveys one of three possible states: satisfied "1 0", hungry "0 1", and starving "0 0". If the starving state is reported, the FIFO of the corresponding sink pool is almost empty. Thus, the amount of credits for the source pool is set to MAXBURST1 32-B blocks.
If the hungry state is reported, the FIFO is partially filled. In turn, the amount of credits is set to the greater of MAXBURST2 blocks or the remainder of what was already granted.
If the satisfied state is reported, the FIFO of the corresponding sink pool is almost full, and no further credits are granted. Credits granted from the last hungry or starving status report remain available however. The MAXBURST parameters are configured on start-up.
The only pattern not mentioned above is the "1 1" pattern. This pattern is reserved as a framing word that divides one status frame from another.
As shown in Figure 5, the status frame begins with the "1 1" framing word, followed by one or more calendar sequences of credit pool status information, a 2-b bus parameter switch control word (BPSCW) for switching calendars, and a 2-b DIP-2 error detection code. The DIP-2 code operates in a manner analogous to that of the DIP-4 code used in the datapath. Frame overhead is reduced by repeating the calendar sequence.
Click here for Figure 5
Figure 5: Status frame format.
In some applications (like the link capacity adjustment scheme (LCAS), pool bandwidths can change during the course of normal operation to the extent of requiring a new calendar configuration. In order to switch calendars, a new calendar must be configured beforehand on either side of the interface.
The sink side of the datapath then proceeds to send a predefined sequence of five 2-b words on the BPSCW (over five status frames), and switches over afterwards to the new calendar. The source side of the datapath will also switch to the new calendar upon receiving the sequence over the status channel. This scheme for switching calendars is designed to be robust to the extent that any 2-b error can be corrected and any 3-b error can be detected.
A training sequence is sent occasionally on the status channel to allow the receiving end to properly synchronize with the 2-b status words. The training sequence consists of one or more training patterns, each of which consists of 8 "0 0" words followed by 8 "1 1" words
That wraps up our tutorial on the SPI-5 interface. Further details are contained in the specification itself, which along with all other approved OIF implementation agreements, is available to the general public at www.oiforum.com (go the Technical Work link).
About the Author
Richard Cam is manager of PMC-Sierra's corporate standards. Richard has served as the technical of both the SPI-3 and SPI-4 Phase 2 specs at OIF. He obtained a Ph.D. degree in Electrical Engineering from the University of British Columbia in 1994, and can be reached at firstname.lastname@example.org.