With the ever-increasing demand for higher bandwidth and data rates throughout the networking world, many silicon vendors are developing highly programmable network processing solutions of various configurations. The design issues tools in this environment must solve are tough and vitally important to the correct operation of the network processing hardware.
On the one hand, they need to tackle the challenges of very fast line rates and deep packet investigation with both parallel and run-to-completion architectural models. At the same time, they need to provide the system designer with the power and simplicity to accurately map applications onto the chip. A complete architectural design tool that simplifies this process can make all the difference in the success of a processing solution.
The process of selecting a network processor begins with an investigation based on identified performance goals. Issues such as maximum memory bandwidth utilization, media switch fabric interface bus contention, and efficient multi-threading among parallel processors must be considered carefully. With some high-end network processors today offering dozens of programmable threads, as well as configurable media interfaces and multiple channels of SRAM and DRAM, this is a formidable task.
Next-generation tools must provide advanced features in order to enable whole product development. For example, a visual editor feature is needed to correlate tasks onto resources, while illustrating the data flow and recalculating bus utilization statistics in real time. The need for advanced development tools is the direct result of the designer's need to incorporate more services at higher levels of performance, with time-to-market always a key consideration.
Balancing ease of development with performance parameters is critical in a development tool. The designer must be able to learn the tool quickly. It is equally important to maximize the amount of cycle-accurate simulation that the tool can provide. A tool that offers the right mix of performance to programmability will be the most helpful to the design process.
The software development process for packet processing algorithms is highly iterative and constrained to a typically small control store within the processing engine. A few kilobytes of instructions are all that are available to the programmer. However, with multiple engines available in most hardware solutions, the specific functional tasks can be broken down and spread across the available resources.
This methodology of parallelism is often used to solve high-bandwidth problems, while also adhering to a pipeline formation of run-to-completion tasks that preserves dependent relationships. A unique insight into this allocation is commonly becoming available through highly graphical "drawing boards" that are similar to floor planning tools used in the hardware design industry for years. A self-documenting scheme of functional tasks, mapped to specific hardware resources, allows a larger development team to comprehend the data flow within the application.
Previously, immature tools hid the overall map of functions, and only a handful of software engineers could develop code at the same time. Even though software for the fast-path processing of packets is relatively small in nature, this code can now be understood by a broader group of developers enabling future reuse of concepts and architectures across product platforms. With advanced tools that can map and self-document the architecture of the software application, a clearer understanding of the design is possible throughout all portions of the project.
It is becoming more and more common to methodically break down applications into re-usable, independent blocks, especially as the underlying silicon becomes more programmable and more powerful. Advanced tool suites that can take advantage of this trend offer important advantages over simple libraries. The suite of capabilities should include four elements.
First, there should be the means by which to assemble the functional blocks together while keeping tabs on certain cycle budgets and memory / bus utilization. This eliminates much guesswork, because virtual prototypes of applications can be quickly generated to ensure a "good fit" of the application onto the network processor.
Since no two applications are the same, it is also important to provide a wide variety of pre-programmed libraries with dynamic parameters. Third, the engineer should also be able to view a task-flow diagram that accurately summarizes the pipelines and includes all dependencies. This helps solve the problem of over-subscribing resources and efficiently organizing tasks in a logical manner. And, as in most well supported embedded projects, documentation and support should include example reference designs and hardware development platforms.
Future portability of code across multiple platforms, and even across multiple families of network processors, is possible using this division of functionality at the "block" level. Clearly defining which functions are performed at which levels, as well as spelling out a standard interface for passing data, helps to build an underlying framework that encourages logical software development. Since the framework is modular, the tool set should be able to decompose the task and allocate it to appropriate hardware resources for the most efficient configuration to maximize performance.
Spreadsheets are often used in network processing design to keep track of internal and external bus bandwidths, utilization of memory channels, and overall headroom availability within the packet processing engines. Knowing this, a tool designer with prior experience developing networking applications can implement similar algorithms to those found in the primitive spreadsheets, algorithms that relate back to the functional pipelines initially laid out. As the application developer makes changes, and these changes are updated by the tool in real- time, the developer is able to see the impact of each change and route the data flow effectively onto the available hardware.
This process of analyzing statistical counters to assess the impact of code changes , and subsequently hand-tweaking the code to maximize efficiency and performance, is called "back annotation." While this level of tuning might not be an issue at lower speeds, high performance applications routinely call for such scrutiny to ensure maximum return.
A new breed of tools now allows for this precise visualization and performance monitoring of the hardware, as long as the required interfaces are designed into the hardware. No longer is expensive and complicated test equipment needed to measure real world performance and line rates. Evaluating the performance of code in real time saves countless iterations of the application by continually offering best-fit feedback from the statistics generated. More and more, first-generation code that is not only functional, but also within budget will become the norm.
Mapping defined functions to hardware is a key requirement for writing efficient network processor code. A certain number of processing engines or threads must be reserved for prioritized tasks. This allows the application developer to keep track of services that need to be provided at desired performance levels. It is extremely useful to have this information readily available in a simple graphical interface-initial estimated resource allocation is more accurate, and overall end application justification is a much better fit.