AF_PACKET V4 and PACKET_ZEROCOPY

Speakers

Magnus Karlsson, Björn Töpel, John Fastabend

Session Type

Talk

Contents

Description

In this talk, the RFC of AF_PACKET V4 and PACKET_ZEROCOPY is presented and discussed. AF_PACKET V4 is a proposed new interface optimized for high performance packet processing. The interface supports zero-copy semantics, to remove expensive memcpy operations, as well as optimized memcpy semantics to support cases where hardware cannot support zero-copy. When a V4 socket is created without the PACKET_ZEROCOPY option, each packet is sent to the Linux stack and a copy of it is sent to user space, so V4 behaves in the same way as V2 and V3.

We then introduce a new optional setsockopt called PACKET_ZEROCOPY, enabling zero-copy and zero syscall semantics on the socket, while still maintaining Linux security and isolation properties. This is achieved by mapping the NIC packet buffers into the user space process memory space, but the HW descriptors are only mapped to kernel space. User space only sees HW-agnostic virtual descriptors and it is the kernel's responsibility to translate between the virtual user space descriptors and HW descriptors. By default in this mode, a packet destined for this socket goes only to user space. Any packet destined for the kernel stack is copied out of the packet buffer (not zero-copy anymore) or filtered out by the NIC (still zero-copy) to another ring even before it gets to the user space packet buffer. This way user space cannot manipulate or see kernel data. The packet destination is determined by programming flow director with tc or RSS with ethtool. With some possible future extensions, XDP is also a great, flexible candidate for determining destination.

The approach provides a performance increase of one to two orders of magnitude for some microbenchmarks compared to previous AF_PACKET versions. We will also have some initial case studies from commercial application vendors. To illustrate the approach, we have implemented support on i40e NICs and veth, but it should hopefully be easy to port to other NICs and virtual devices too. The approach plugs into the existing XDP infrastructure even when PACKET_ZEROCOPY is enabled. XDP programs will be executed on a supplied page from the V4 packet buffer and XDP_PASS passes a packet to the V4 user-space packet buffer without any copies, when a packet is destined for a V4 process with PACKET_ZEROCOPY enabled.