While most embedded-networking designers would agree on the intended benefits of network processors (NPs), there has been no such agreement on how they should be programmed. The lack of an installed software base, coupled with the pursuit of diverse hardware architectural approaches, has led to a variety of programming models and languages, with no clear evaluation criteria.
Yet, if NPs are to deliver the promised time-to-market benefit, software costs will likely dominate hardware costs over the life of deployed systems. Clearly, it is important to understand the trade-offs between NP programming model and language alternatives and how those trade-offs influence lifetime software and system costs in the context of five basic alternatives:
- Text based vs. GUI based;
- Sequential vs. explicitly parallel/multithreaded/pipelined;
- Application optimized vs. general purpose;
- Very high-level language vs. conventional vs. very low level; and
- Descriptive vs. imperative.
The programming model for an NP defines the programmer's view of the NP and specifies four basic elements. The first is the set of functions or portions of functions provided by the NP that software can control, including where and how control and data are passed between nonprogrammable hardware functions and software. Next is the set of abstractions that software can manipulate. Then there is the set of operations that can be explicitly and implicitly performed on those abstractions. And finally, how these abstractions and operations can be composed, including a model of operation to allow the programmer to predict the behavior of the resulting system.
But it must cover a much broader scope than the set of programming languages supported, including all programmer-visible aspects like entry points, APIs, threading, parallelism, pipelining, preemption, mutual exclusion, synchronization, interrupts, packet ordering, etc. Those are typically beyond the scope of most programming-language definitions.
Clearly the programming environment must accurately model the relevant characteristics of the underlying hardware for which the software is being developed. Ideally, it will also support all the other line-card components (e.g., framers, fabric) as well as whole line and fabric cards within the same framework to allow full system-level development and simulation.
And since the primary motivation for using NPs is economic, total-cost-of-ownership considerations should strongly influence programming language/model. To do that, the following factors should be considered: Development cost and interval; complexity; performance; programming environment support; reliability; maintenance/evolution; and reusability.
However, one cannot simply wrap glitzy software around poor hardware, a flawed programming model or a poor development methodology and call it a good programming environment. The test is whether a typical user can easily and economically create and maintain reliable, high-performance software.
Given the preceding criteria, all network-processing units (NPUs) must be looked at together with the five basic programming model/language alternatives mentioned earlier.
In a text-based language model, the programmer writes text in one or more programming languages to specify the functions under software control. This typically allows coding using any text editor and supports many options for source-code control. In a graphic user interface (GUI)-based language model, the programmer manipulates icons or other objects in a GUI to specify the functions under software control. In theory, these are really duals of each other. The GUI approach may be more intuitive (if executed well) but may be less expressive than a text-based approach, especially if it just offers configuration rather than comprehensive programmability.
Almost all NP hardware architectures employ parallel processing, multithreading and/or pipe-lining. This is an artifact of their implementation, not of the application problem they are solving. A key distinction among NP programming models is the extent to which the details of the underlying hardware are exposed to and explicitly managed by the NP application programmer.
When the programmer explicitly manages these details, the resulting software can be made to be very resource-efficient, but very complex to develop, debug and maintain. For example, most networking applications require or at least prefer that packets be switched/routed in the order in which they were received.
Assigning packets to parallel engines means that they may complete processing in a different order in which they were received. Similarly, state maintenance often requires updating state concurrently with reading it.
If separate processing engines, pipeline stages or threads do the updating and the reading, their accesses to the state memory must be synchronized. Furthermore, it is difficult to reuse the software developed on a subsequent-generation NP that has a different number of engines or different pipeline structure, for example.
Explicit parallelism is a major source of accidental programming complexity, fraught with complications, including race conditions and other reliability headaches. Explicit software pipelining between processing engines is less onerous but still requires tedious manual balancing of work among pipeline stages, so that the resulting partition of processing into pipeline stages is fragile and brittle in the face of application requirements changes.
It is clearly preferable to hide the details of the underlying organization of processors, pipeline stages and associated memories from the application software by providing a sequential programming model. By pushing the details of resource allocation off to the programming environment, the key challenge becomes its efficient implementation on the underlying hardware with acceptable performance.
An application-optimized or domain-specific programming language builds problem domain knowledge into the language. For example, Agere's Functional Programming Language (FPL), Intel's Network Classification Language (NCL), and Solidum System's PAX Packet Description Language are examples of domain-specific languages designed for packet classification.
Typically, application-optimized languages will be very high-level and descriptive rather than imperative (though other combinations are possible). This combination provides significant software-engineering benefits in the form of lower development complexity, and improved maintenance and reusability. Likewise, a well-executed application-oriented language is potentially much more suitable to automatic optimization than C or low-level software. As above, a key challenge with this approach is to efficiently implement it on the underlying hardware with acceptable performance.
A very low-level language in the hands of a highly capable and knowledgeable software developer can potentially provide compact and efficient code that fully exploits the capabilities of the underlying hardware. However, as with programmer-managed parallelism/pipelining/threading, the resulting software can be very complex to develop, debug and maintain, and becomes tied to a specific hardware. Some of these complexities include memory latencies and register allocation.
Although the software community has understood the disadvantages of low-level programming for some time, it is often viewed as the price to be paid to achieve acceptable performance.
In contrast, a very high-level language can support rapid software development and maintenance intervals with higher reliability (again, with the caveat that it must be efficiently implemented on the underlying hardware to be useful).
Using a conventional language such as C or C++ has its own set of trade-offs. If the network processor has conventional RISC processor cores, using existing tool suites like GNU gcc or gdb, it may reduce the amount of learning required for a programmer new to the NP. However, conventional RISC processor cores typically won't have adequate performance for many networking functions.
If the RISC processors are augmented with networking-specific instructions and/or specialized coprocessors, it is nontrivial for the programming environment to invoke those capabilities automatically by analyzing the code. Likewise, if the full C or C++ language is supported (e.g., pointer aliasing), the ability to automatically optimize code can be limited.
Because of those issues, an ad hoc extension of C that includes the basic primitive operations and data types of the underlying hardware is sometimes used. Alternatively, a set of APIs might be used to allow access to these specialized hardware resources. Likewise, the C-like language used may actually be a subset of C in some areas. For example, arbitrary pointers might be restricted, as in standard C. While those approaches have advantages, programmers must nonetheless understand exactly how they differ from generic C or C++.
When evaluating whether an NP solution provides C or higher-level language programmability with adequate performance, it is revealing to look at what performance benchmarks or test results are offered.
An imperative programming language provides a sequence of commands (where each command typically changes the state of memory) specifying how to solve a problem. The algorithms must be supplied by the programmer or by procedures/functions previously written.
In contrast, a descriptive programming language describes the problem to be solved rather than the algorithm or series of operations to be used to solve it. By focusing on the underlying problem to be solved, descriptive approaches inherently provide a degree of hardware abstraction. Again, the key challenge is the extent to which the language can be efficiently implemented on the underlying hardware.
See related chart