Continuing from our previous two articles; as networking line rates pass through the 10G and 100G milestones and are now heading towards 200G aggregate and higher rates, the memory interface has become a bottleneck, requiring fundamental changes in memory design and system implementation. Part 1
outlined the requirements for high performance buffering applications with emphasis on high efficiency data storage and transfer. Part II
of the series described the advantages of a high access rate memory subsystem.
The article delineates between the importance of a massively parallel memory architecture which mirrors the growing trend from multi-core to many-core processors and a brute force low latency memory device which achieves performance at the expense of power and capacity.
This article, Part III, explores the advantages of adding intelligence to the memory device in order to offload memory intensive operations from the host. Through this article we reference 4x100G as the performance capability. These solutions are easily adaptable to 12x40G and 48x10G. This intelligent memory device plays the role of an entire subsystem which is capable of executing and retiring instructions while maintaining data records in a carrier class, ECC protected, high performance offload accelerator.
The applications for an intelligent memory device go beyond the transferring and header processing of network traffic. Intelligent networks gather and track information about performance and usage. This information can be used for the Operations, Administration and Management (OAM) of networking services and equipment. OAM is a broad class of services pertaining to monitoring and managing usage for Quality of Service (QoS), ensuring compliance with Service Level Agreements (SLAs) and can be used for billing and accounting purposes.
Advanced Network Management
Statistics, Metering and Shaping functions are traditionally associated with carrier class Ethernet which is distinguished from the Local Area Network (LAN) Ethernet by Standardized Services, Scalability, Reliability, and Service Management. These attributes provide capabilities to transform traditional LAN Ethernet into a technology suitable for deployment in service provider Metro and Wide Area Networks.
With the emergence of datacenter or cloud services, the networks are requiring carrier grade performance in order to deliver an engaging and robust user experience.
Carrier class Ethernet delivers traffic between locations, with each traffic type representing a different class of service. Each class of service requires its own quality guarantees and is marked so it can be distinguished during transport with the goal being to deliver committed service levels of performance. These commitments can be as coarse as bandwidth delivered over a particular sampling period, or as network applications become more sophisticated operators are looking to guarantee more granular and real-time measures such as frame delay, jitter and frame loss.
The processing required for the flow of packets through a carrier class stack is illustrated in Figure 2.
Whether these advanced functions are implemented as background processes or if they are part of the active management of network traffic, the actual collection of data, like packet header processing, needs to occur at aggregate processing rates. In this article, we consider two examples, the offload of statistics counters and a two-rate/three-color token bucket implementation used for accounting, metering and shaping of network traffic.
APPLICATION: Statistics Counters
As a packet goes through many decision processes it is common to keep track which actions were taken and to manage network performance which may require multiple counters per line interface or per flow in the case of a flow based architecture. Counters can be implemented with a range of capabilities depending on the needs of the application.
The most basic count information tracked can be as simple as Packet and Byte per flow.
More complex systems, such as high speed carrier switch hardware with more sophisticated flow management, can use 8 or more counters per flow.
The most complex firewall systems can use up to 26 or more counters.
Counters are implemented by performing a read-modify-write memory operation which can be accelerated by using lower latency and faster cycle time or dual ported memories.