Design Article
Embedding TCP/IP: Working through implementation challenges—Part II
Christian Legare, Micrium
3/20/2013 2:56 PM EDT
Miss Part I? Click here
3-2 Ethernet Controller Interface
Other important factors influencing the performance of an embedded system include the system's ability to receive Ethernet frames in network buffers to be later processed by upper protocol layers, and to place data into network buffers for transmission. The predominant method for moving Ethernet frames between the Ethernet controller and the system's main memory are via software (using functions such as memcopy() which copies every byte from one location to another), or via Direct Memory Access (DMA).
With memcopy(), the CPU must copy every byte from one memory location to another. As a result, it is the slower of the two methods. memcopy() is always slower than DMA, even when writing the memcopy() function in highly optimized assembly language. If the only solution is to create an optimized memcopy(), in μC/TCP-IP, this function is located in the μC/LIB module.
DMA support for the Ethernet controller is a means to improve packet processing. It is easy to understand that, when frames are transferred quickly to and from the TCP/IP stack, network performance improves. Rapid transfer also relieves the CPU from the transfer task, allowing the CPU to perform additional protocol processing. The most common CPU to Ethernet Controller configurations are shown in Figure 3-1.
Moving Ethernet frames between an Ethernet controller and network buffers often depends upon specific Ethernet controller and microprocessor/microcontroller capabilities.

F3-1(2) Represents a CPU with an integrated MAC, but with dedicated memory. When a frame is received, the MAC initiates a DMA transfer into this dedicated memory. Most configurations of type 2 allow for transmission from main memory while reserving dedicated memory for either receive or transmit operations. Both the MAC and the CPU read and write from dedicated memory, and so the TCP/IP stack can process frames directly from dedicated memory. Porting to this architecture is generally not difficult and it provides excellent performance. However, performance may be limited by the size of the dedicated memory; especially in cases where transmit and receive operations share the dedicated memory space.
F3-1(3) Represents a cooperative DMA solution whereby both the CPU and MAC take part in the DMA operation. This configuration is generally found on external devices that are either connected directly to the processor bus or connected via the Industry Standard Architecture (ISA) or Peripheral Component Interconnect (PCI) standards. Method 3 requires that the CPU contain a DMA peripheral that can be configured to work within the architectural limitations of the external device. This method is more difficult to port, but generally offers excellent performance.
F3-1(4) Illustrates an external device attached via the CPU’s external bus. Data is moved to and from main memory and the external device’s internal memory via CPU read and write cycles. This method thus requires additional CPU intervention in order to copy all of the data to and from the device when necessary. This method is generally easy to port and it offers average performance.
It is very important to understand that TCP/IP stack vendors may not use all of the Ethernet Controller capabilities, and will often implement a Memory Copy mechanism between the Ethernet Controller and the system's Main Memory. Memory Copy operations are substantially slower than DMA operations, and therefore have a major negative impact on performance.
Another important issue, especially for an embedded system design, is how the NIC driver (i.e., software) interfaces to the NIC controller. Certain TCP/IP stacks accomplish the task via polling (checking the NIC controller in a loop to see what needs to be done). This is not the best technique for an embedded system since every CPU cycle counts. The best interface mechanism is to use interrupts and have the NIC controller raise an interrupt when CPU attention is required. The μC/TCP-IP Driver Architecture is interrupt-driven. Driver development and porting are described in Chapter 14, “Network Device Drivers” on page 301.
3-2-1 ZERO COPY
TCP/IP stack vendors may qualify their stack as a zero-copy stack. A true zero-copy architecture refers to data in the memory buffers at every layer instead of moving the data between layers. Zero copy enables the network interface card to transfer data directly to or from TCP/IP stack network buffers. The availability of zero copy greatly increases application performance. It is easy to see that using a CPU that is capable of complex operations just to make copies of data is a waste of resources and time.
Techniques for implementing zero-copy capabilities include the use of DMA-based copying and memory mapping through a Memory Management Unit (MMU). These features require specific hardware support, not always present in microprocessors or microcontrollers used in embedded systems, and they often involve memory alignment requirements.
Use care when selecting a Commercial Off-the-Shelf (COTS) TCP/IP stack. Vendors may use the zero-copy qualifier for stacks that do not copy data between layers within the stack, but perform memcopy() between the stack and the Ethernet controller. Optimum performance can only be achieved if zero copy is used down to the Data Link Layer. Micriμm's μC/TCP- IP is an example of a zero copy stack from the Data Link layer to the Transport layer. The interface between the Transport layer and the Application layer in μC/TCP-IP is currently not a zero copy interface.
Navigate to related information

