Many limitations of current servers and the failover techniques used to achieve the appropriate level of reliability necessitate compromises when applied to telecom services. Most limitations, and the problems in using commercial off-the-shelf technology, can be overcome through the use of a parallel-processing architecture.
In the commercial world, parallel processing has seen limited deployment since many applications involve working on one or a small number of large tasks at a time and therefore are largely sequential in nature. However, most telecom and datacom applications are inherently "fine-grained." They involve the concurrent processing of voice and data from hundreds or thousands of separate, independent sessions, and they therefore lend themselves nicely to parallel processing. Many commercial applications could be written to support parallel processing more extensively. But parallel processing has been underutilized.
The most appropriate hardware/software configuration in this environment is an "N+k" cluster configuration where all nodes are active. The cluster may be heterogeneous, populated with different generations of CPUs and operating systems or even with different CPU architectures-Sun, Motorola, Intel-working together in a single physical cluster or from geographically separated locations.
To program in this environment, GNP's engineers felt that the best parallel-programming methodology would be one based on Linda, a coordination language in which a clear separation is established between the components of the computation and their interaction in the overall program or system. The language consists of a set of primitives that are used to access a shared data space. The primitives are implemented as library routines called from a host language, such as C.
The program segments run as independent processes or threads on any number of nodes within a cluster. The most common state for a cluster system is to have a number of processes, called workers, that are ready and waiting for work, so that the work pool is typically empty or has only one pending work item.
To support such a programming environment, the cluster approach we developed uses a "pool of work" or "bag of tasks" model, in which data (the work or tasks) is placed into a virtual shared memory space (the pool or bag). A processor pulls the data out of the shared memory, processes the unit of work and, if necessary, puts the result back into shared memory. It then begins the whole process over again.
The approach is simple and robust, since interprocessor communication is asynchronous and anonymous-there is no need for processes or processors to identify each other for interaction. It has been successfully applied in such applications as processing high-energy physics data. However, it was optimized for high throughput and computational accuracy but not necessarily high availability, and it is therefore not directly suitable for telco apps.
To make this model work better in the telecom server environment, GNP developed a set of natural clustering technology (NCT) extensions to Linda. With those extensions in place, each processor in a server cluster allocates a portion of its memory to a shared virtual memory space. This virtual shared memory is generally referred to as tuple space. Each tuple-defined as an ordered set of data fields-or piece of work contains a sequence of typed data elements that may take various forms, including integers, floats, characters and arrays of data elements.
Message-processing applications such as SS7, H.323, the Session Initiation Protocol or the Media Gateway Control Protocol are decomposed into a stream of transactions that consist of input data, a unit of work and resulting output data. An individual call, therefore, is broken down into its message transactions and processed by multiple nodes simultaneously. A message not handled in a defined period of time "reappears" in the shared memory space and is available for another node to pick up.
At any given time, tuple space contains data associated with the activation of a new application or intermediate products of applications already in process. Any processor can read a tuple from tuple space, process it and return the results. Any available processor can then read the result, process it, write that result back into tuple space and so on. The tuple space therefore represents a pool of work that is available to any processor. Executing an application consists of progressively processing and updating the tuples.
No node management
Depending on the cluster configuration, the tuple space resides on one or more processors, and a copy is maintained on backup server. We have developed an NCT "transaction" application programming interface that, in the event of a node failure, performs a rollback and allows another node to process the information.
Using the finer-grained parallel programming possible with Linda and the NCT extensions yields a number of advantages. Each worker or node pulls its own work from tuple space; there is no need for a management process or layer of software to assign each piece of work to each node. Also, every node in the cluster is active and continuously pulling its own work, so there is no failover for client processes. If one of the nodes fails, it simply stops taking, work and the work flow continues across the other nodes. The amount of work handled by the other nodes may rise, but each one works up to its capacity.
The result is a natural balancing of work across the cluster.
See related chart