Design Article
Cut to the Core of Optimal MPLS Router Design
Chaoping Wu
7/1/2002 4:55 AM EDT
Editor's Note: To view a PDF version of this article, Click Here.
Multiprotocol label-switching (MPLS) technology is about to take off. As the most widely used MPLS routers, label edge routers (LERs) require a distributed architecture for high performance, proper utilization of network processors and ASICs for the flexibility to satisfy the ever-growing MPLS applications, and modular MPLS software to support up-to-date IETF specifications. When combined with the need to integrate advanced features such as virtual private networking (VPN), Internet Protocol security (IPSec), firewalls and network address translation (NAT), the design of these LERs becomes especially challenging under ever-present time-to-market pressures.
To meet these challenges, the designer must take into account every detail of MPLS edge-router design. Particular attention must be paid to the selection of MPLS implementation models and router function-distribution and control-plane software architectures, as well as the intricacies of wire-speed label switching with forwarding information databases implemented in network processors or in ASICs.
The first step of the MPLS LER design is choosing the right MPLS implementation model for targeted applications. A well-designed MPLS router may support more than one MPLS implementation model, which may be necessary because a wide range of MPLS applications is becoming popular. Choices include MPLS over asynchronous transfer mode, frame relay, Point-to-Point Protocol and Ethernet, as well as Ethernet/frame relay/ATM over MPLS and MPLS VPNs. These implementation models may have different MPLS signaling protocols, extended dynamic routing protocols, IP stacks, Layer 2 encapsulation methods, label stacks and hierarchical networks.
MPLS stack levels vary
Simpler MPLS applications may use one level of label stack only. For this model, we can upgrade some original ATM/frame relay switches to LERs. As shown in Figure 1, ATM-LERs use virtual path identifiers (VPIs) and virtual connection identifiers (VCIs) as labels; in frame-relay-based LERs, data-link identifiers are used as labels. The control planes of these devices are enhanced with routing protocols and MPLS signaling protocols.
The limitations in these upgraded routers are usually associated with the packet-label header-processing capabilities, since these devices may use some kind of dedicated hardware that can only operate on a fixed format of packet headers such as ATM and Ethernet. To illustrate various MPLS packet headers, let's take a look at Figure 1, which also indicates formats of packet headers in different types of MPLS applications.
We can see that slightly more complicated MPLS applications use more than one level of label stack. To accommodate label stacking, we may need newer hardware platforms. For instance, Ethernet over MPLS uses two levels of label stack as specified in the Martini Internet draft from the IETF. This usually requires some new Ethernet chips, depending on the original chips' flexibility to support new headers.
One of the most popular MPLS applications is the MPLS/Border Gateway Protocol VPN, based on RFC2547bis. To support this application, edge routers adopt a model using VPN routing/forwarding tables. Here, one edge router supports many VPNs for different customers connected to interface ports with overlapping private IP addresses, as well as Internet access for customers. This kind of router requires multiple VPN routing/forwarding tables and extended MP-BGP4 features.
Another very scalable model of MPLS VPN application uses virtual routers (VR). In this case, a physical router consists of multiple VRs that use virtualized TCP/IP stacks and routing protocols. Each customer uses its designated VRs for intra-VPN communications. This kind of router may be configured with a large number of customer VRs and a limited number of backbone VRs, as shown in Figure 2.
Backbone VRs have an overlay routing relationship with the customer VRs. They are responsible for building proper routes/label-switched paths (LSPs) in the core IP network and for tunneling customer traffic through, but they do not participate in internal customer routing. As indicated in Figure 2, the routing protocols among backbone VRs are independent of those run in customer VRs. Administrators can provision multiple backbone VRs and allow different Internet service providers to use separate backbone VRs for their respective customers.
Once we select the MPLS models that are required to support a product, we enter our router architecture design stage. As an example, let's investigate a chassis-based ATM LER system with a distributed label-switching architecture and network-processor-based line cards with ASICs.
The main components of different cards inside the chassis, including interface cards, control cards and fabric cards, are illustrated in Figure 3. This sample router describes a leading-edge design, fitting into the more-complicated MPLS implementation model described previously that supports VPNing and uses label stacking. It also offers advanced IP services such as IPSec, firewalls and NAT with wire speed up to OC-12 (622 Mbits/second).
The routing and MPLS signaling protocols may run in a centralized management card-with an active backup management card in the chassis for high-availability systems. Advanced IP services are performed in customer-side line cards via network processors, ASICs, encryption processors and so on. The uplink card has the highest throughput requirement, so it may need more processing power. This can be achieved via separate network processors/ASICs for transmission and receive data paths as shown. To guarantee internal message delivery among different cards in the chassis, we can use a serial control bus, based on either 10/100-Mbit/s Ethernet or PCI.
Plane distributions
During the architectural design stage, an important issue that must be addressed involves the distribution of the management plane, control plane and data plane. From a data-processing perspective, the construction of the management data path, fast data path and slow data path must also be given due consideration.
Various system functions in these distinctive planes can be distributed into different interface cards and control cards. For edge routers, it is typical to have the data plane fully distributed. The control plane, running different IP/MPLS protocols, may be distributed as well, to offload a central management card and to provide more resiliency. In addition, all functions in the switch fabric card can be distributed into line cards. The crossbar-based fabric card, as shown in Figure 3, may be replaced by direct interconnections between every pair of line cards. In these distributed systems, we avoid performance bottlenecks as well as operational failures at many single points.
After introducing the plane distribution in hardware, we can move on to examine another major characteristic of a highly distributed system: the software modules. These are distributed in processors that reside in the same or different cards. A control-plane software module in one CPU is required to communicate with different network processors in the same interface card in case multiple network processors are used for transmit and receive processing. We can also put management-plane software modules in control CPUs of different interface cards. The management software modules in one card then exchange information with their distributed peers in other cards. They should be able to establish dynamic communication links when the remote peer is available and remove the links if the remote peer fails. In fault-tolerant systems, software-based heartbeat messaging is used to detect the health status of the remote peer.
Wire-speed LERing
After careful consideration of the distribution issues, we must now investigate the various data paths of wire-speed MPLS edge routers. In general, data paths can be classified into the management data path used by system-management traffic. These include the Simple Network Management Protocol or Telnet, the control data path (used by routing and signaling protocols like Open Shortest Path First and LDP) and the user data path. The first two are less time-demanding than the user data path.
We can further divide the user data path into a fast data path and a slow data path. The slow data path handles exception user traffic like IP options and fragmentation.
For high-throughput designs, we focus on the critical fast data path. Here, we can use network processors, encryption processors and ASICs for label switching, IP routing, IPSec, firewalls, NATs, etc. Figure 4 shows a line card that can perform advanced IP services.
Using network processors for cutting-edge design has particular flexibility advantages in the early stage of MPLS applications. As the MPLS applications and specifications expand, the label in packet headers may take different formats as evinced in Figure 1. The header-processing capability in router data paths becomes a vital factor in supporting new MPLS features. A future-proof platform is extremely important for good MPLS router designs and must be given much consideration.
In network processor software, flexible forwarding information base (FIB) tables can be programmed easily. The FIBs include a next-hop label-forwarding entry table (NHLFE), forwarding equivalent class (FEC) to NHLFE table and an incoming label-mapping table (ILM) mainly. Shown in Figure 5, the NHLFE is the basis of all FIB tables. It contains the destination IP address, subnet mask, the next-hop IP address with its metrics, egress label, label-stack operation (such as push and pop), egress interface, related LSP information and so on.
Both the FTN and ILM are associated with one or more NHLFEs. They use an NHLFE to retrieve outgoing label and port information. In the case of load balancing or some other policy-based routing, a specific NHLFE has to be retrieved from a series of NHLFEs that are associated with an FTN or ILM based on predefined rules.
The FTN is used when an unlabeled packet (i.e. the IP address of 40.1.1.1 in Figure 5) is received by the LER. The LER may use any predefined rule to map an IP packet to a FEC. A simple rule is to associate IP addresses directly to FECs. Other rules include classification based on any combination of incoming port number, ingress VPI/VCI, source and destination IP addresses, protocol type, source and destination TCP/User Datagram Protocol) port numbers and so on. When an unlabeled packet is classified into a FEC entry in the FTN table, the corresponding NHLFE is consulted for the outgoing label and port. Thus, the packet can be labeled and transmitted to the next-hop LSR (for example, the IP address of 40.1.10.1 with label 40 that is shown in Figure 5).
The ILM is used when a labeled packet is received. At this time, the LER verifies the label first (e.g., label 80 with destination IP address of 80.1.1.1). Packets with invalid incoming labels in the ILM table or without any entry in the ILM are always dropped. The label may be used as a direct key in the ILM and the ILM provides a link to the proper NHLFE entry (e.g., NHLFE entry #0 in Figure 5) for egress-port information such as Layer 2 encapsulation formats and values, if any.
For throughputs over 1 Gbit per second, label switching may be executed in properly designed ASICs as well. One or several ASICs can be used to form a routing and label-switching hardware base, together with associated SRAMs for relevant tables and DRAMs for packet buffers. Usually, the hardware comprises a few functional blocks such as a routing engine, a flow-classification engine and a label-switching engine. The FTN and ILM tables may be implemented in ASIC internal memories. The NHLFE table may be located in an external SRAM area. Labeled packets use the flow-classification and label engines and unlabeled packets go through all of these engines during their processing. Only the headers of the processed packets are required to be moved and manipulated from one engine to another, while the rest of the packet may stay in the buffer area in DRAMs.
The determination of whether a packet is labeled is done as the packet is received and is done according to its link-layer format and value. For an unlabeled packet, the classification engine maps it to a FEC and uses the FTN table to get an index to the NHLFE. If the NHLFE indicates a valid LSP for the packet, the packet is then passed to the label engine and the label engine pushes a label and transmits the packet. If the NHLFE indicates no proper LSP for the packet, the packet is sent to the routing engine for regular routing. For a labeled packet, the classification engine uses its ILM to find out the NHLFE index and sends the packet to the label engine. There, the label engine may pop, push or replace the label according to the value of the label operational field in the NHLFE.
Control-plane software
Finally, let's discuss another integral part of the MPLS router design: control-protocol software. As a router, LERs have regular software modules for functions in the control plane for route updates via OSPF, BGP, etc. To perform label switching, these software modules are enhanced to provide interfaces with MPLS signaling protocols to retrieve route information for LSP establishment and management. Besides, we need to add MPLS signaling protocol software, which is responsible for LSR/LER peer discovery, label assignment and distribution, and label-retention management.
A control-software architecture in an MPLS router is depicted in Figure 6, with a detailed implementation view inside the Label Distribution Protocol (LDP) as an example. The MPLS signaling protocols (LDP, Resource Reservation Protocol- Traffic Engineered) have their similarities with other routing protocols when implemented in software modules.
Using some abstraction, then, we can find that these protocols usually consist of the following software blocks, as indicated in Figure 6: a timer-triggered event generator, a peer-discovery (Hello messaging) mechanism, an information database, a protocol message parser and possibly a policy manager.
The LDP uses UDP port number 646 to send Hello messages periodically in a multicast IP address to all routers in the subnet to indicate the existence of an LER/LSR. After verifying the Hello messages, two neighboring routers establish a TCP session in TCP port 646 for further reliable message exchange. The first message is usually an initialization message from both sides. After that, both sides are ready to start the label advertisement messages.
Meanwhile, routing protocols such as OSPF also start up by multicasting Hello messages to establish adjacencies. Adjacent routers exchange database description messages, and send link-state requests and updates to synchronize routing tables.
In topology-driven LSP establishment, the LDP requests a label for each route entry in the routing table to the next-hop MPLS-peer router. The next-hop (downstream) router allocates the label from its label space. In case the router is running in an independent label-distribution mode, the LDP sends a label-mapping message back to distribute the label to the requesting (upstream) router. The upstream router then updates its label-forwarding information database and the LSP is then considered established.
A clear understanding of the steps involved in the design process of MPLS routers, as described above, will go a long way toward the development of a solid end product.
Chaoping Wu is a senior engineer at Vpacket Communications. As an experienced designer of routers over the past decade, Chaoping has worked at senior engineer positions in companies such as Nortel and AccessLan Communications. He received an MS degree from the University of Arkansas and a BS from Shanghai Jiao Tong University in China. Chaoping can be reached at chaopingwu@attbi.com.


