United Business Media EE Times


Search

HOMEMARKET INTELLIGENCE UNITFORUMSDESIGNNEW PRODUCTSCAREERSBLOGSCONTACTEVENTSSIGN UP!RSSMost Popular contentTrusted Sources

 


Mesh nets weave path protection/restoration schemes
Print this article Email this article Reprints RSS Digital Edition

EE Times


With the advent of intelligent optical switching, vendors and service providers alike have started to evaluate the advantages of moving away from conventional ring-based network architectures. Instead, they are looking at mesh-based network architectures, which offer restoration using intelligent optical switches. Intelligent optical switches can create a distributed control process, where each switch is aware of the available capacity on the network and can, in the event of a failure, automatically determine the optimal restoration path.

Mesh restoration, a departure from more typical protection schemes, requires considerably less bandwidth than conventional Synchronous Optical Network/Synchronous Digital Hierarchy (Sonet/SDH) protection mechanisms because of its greater flexibility in the sharing and pooling of protection bandwidth. While mesh clearly provides several benefits, the fairly large differences between mesh and ring protection options can be intimidating to network operators.

Mesh-based protection schemes use networking software based on Internet Protocol (IP) standards-based routing and signaling protocols such as Open Shortest Path First (OSPF) and Multiprotocol Label Switching (MPLS), respectively. Industry-leading switches can accommodate these protection mechanisms alongside other mesh and ring-protection schemes. This approach gives network operators a pragmatic migration plan to introduce mesh-protection mechanisms — where and when required — while seamlessly working within existing networks and methodologies.

One of the key benefits of a mesh architecture is its flexibility. An optically switched mesh architecture offers a variety of protection schemes, some of which provide rough equivalents of conventional ring-based protection schemes. Advanced intelligent optical switches can also simultaneously support conventional ring-protection schemes alongside mesh-restoration options.

Most ring-based protection options have an equivalent in mesh networks (see accompanying table). Shared-path restoration, however, is an entirely new approach with no equivalent in standard Sonet/SDH protection schemes. To better understand shared-path restoration, consider its mechanism for failure detection and route calculation and setup, as well as the factors that determine restoration times.

Sonet/SDH provides an extensive set of failure-detection mechanisms. The choice of which failure to react to can be independent of the choice of protection mechanism. Typical failures include hard failures such as fiber cuts, and transponder and node failures, as well as "soft" failures such as excessive bit errors. Although mesh-restoration schemes are not standardized in Sonet/SDH specifications, most of them nevertheless employ the specification's failure-detection mechanisms.

Once a failure is detected, a protect route needs to be determined. In some restoration schemes, protect paths are determined in advance. Even then, periodic recalculation is required when more circuits are added or when failures are detected.

In one scheme that calculates a restoration path after failure detection, the path calculation must be performed by some controller, either a network management system (NMS) in a centralized scheme or an individual optical switch in a distributed scheme. The controller has a map of the entire network topology and a database of bandwidth usage on the various spans in the network, which it uses to perform the path calculation.

The network topology and the bandwidth usage are updated periodically using industry-standard routing protocols such as OSPF or Intermediate System to Intermediate System (IS-IS). The controller incorporates the newest failure information, and then uses the updated map to calculate a new path for the failed circuit.

Depending on the level of detail in the network topology and bandwidth usage database, the path may be only a sequence of nodes, or it may specify detailed choices of exact spans or even exact timeslots. There are many choices for the path-calculation (routing) algorithm itself, but typically they are modified versions of a shortest-path algorithm. Advantages of using shortest-path algorithms include speed and simplicity.

Occasionally, a restoration path of sufficient bandwidth cannot be found. This usually means that the network has been overloaded with more circuits than it is meant to support with restoration, or that there are concurrent, multiple failures in the network. In either case, the non-restored circuit will remain in a known "down" state, and typically the controller will periodically retry to find a new path.

After a new path is found for the failed circuit, a signaling scheme such as MPLS is used to set up the new path. In the typical hop-by-hop signaling scheme, the source node first sends a request message downstream, hop-by-hop, along the new path. When the destination node receives this message, it sends a confirmation message upstream, again hop-by-hop. Although hop-by-hop signaling is typical, it is also possible for the controller to notify all nodes on the new path in parallel as well as receive confirmation in parallel.

During the downstream request process, each node performs resource allocation and cross-connect initiation. In resource allocation, a design choice may be made to have the path-calculation algorithm return a path that specifies the nodes. But to avoid excessive OSPF flooding, this scheme does not specify the exact spans and timeslots on each hop-those are chosen during path setup, with the decisions made locally between the two nodes of a hop.

When a node knows the exact spans and timeslots upstream and downstream, it initiates a cross-connect action. To optimize speed, a node can forward the signaling message downstream without waiting for the cross-connect to finish. The actual cross-connect involves internal routing and hardware write operations (on the line cards), which proceed in parallel with the rest of the signaling.

During the upstream confirmation process, each hop simply makes sure that the cross-connect is established before forwarding the confirmation message upstream. Therefore, when the confirmation process is complete, traffic is guaranteed to be flowing on the new end-to-end restoration path.

Experience shows that the new-path setup process dominates restoration time for two reasons. First, fault-detection time is fully compliant with Sonet/SDH standards, which is a few milliseconds for typical hard failures, and second, software profiling determines that shortest-path calculation (such as using a modified Dijkstra algorithm) only takes a few milliseconds. Therefore, characterizing new-path setup time is indicative of overall restoration time.

If there is only one circuit being restored in the entire network, then the restoration time (RT) should be of the form RT = A + (B * P), where A and B are constants (measured in milliseconds), and P is the hop-count of the new path. The per-hop coefficient, B, is the time needed for per-node processing plus message forwarding, while the constant coefficient, A, is the time for any special processing at the two end nodes.

Based on numerous experiments with various networks, actual restoration time generally follows this form. However, the formula is not exact because of a number of variables, mostly related to software processing priority and OS task scheduling.

This linear form assumes hop-by-hop signaling. If a parallel signaling scheme is used, the formula could be different depending on the exact level of parallelism achieved in both the per-node processing and the internode communication. An example is separate request messages from the controller to each node on the restoration path vs. a single broadcast/multicast message.

If a single failure affects multiple circuits (such as when a fiber contains multiple wavelengths) the total restoration time for all affected circuits depends greatly on the degree of parallelism and pipelining achieved between the concurrent restoration efforts.

Major factors associated with parallelism include the number of controllers (for example, centralized NMS vs. distributed switches) and location of the new protect paths (whether they share many nodes in common, which would increase the processing load at those common nodes).

Experience shows that with distributed controllers the total restoration time is much shorter than the upper bound (the perfect serialization scenario) of (A+B*P)*N where N is the number of circuits. In certain topologies it can approach the lower bound (the perfect parallelism scenario) of A+(B*P), where P is the largest hop count among N new protect paths.

Anthony Kam is a principal architect at Sycamore Networks Inc. (Chelmsford, Mass.).

See related chart






  Free Subscription to EE Times
First Name Last Name
Company Name Title
Email address
  Click here for your Free Subscription to EETimes Europe
 
CAREER CENTER
Looking for a new job?
SEARCH JOBS
SPONSOR

RECENT JOB POSTINGS
CAREER NEWS
SRC Expands R&D Centers
The Semiconductor Research Corp has added a new center to its university R&D efforts.

For more great jobs, career related news, features and services, please visit EETimes' Career Center.


All White Papers »   

 
Education and
Learning


Learn Now:












Home | About | Editorial Calendar | Feedback | Subscriptions | Newsletter | Media Kit | Contact | Reprints|  RSS|   Digital|  Mobile
Network Websites
International
Network Features




All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement | Terms of Service | About