Session

Shared Memory Pool for Representors

Speakers

William Tu
Michal Swiatkowski
Yossi Kuperman

Label

Nuts and Bolts

Session Type

Talk

Contents

Description

In switchdev mode, representors are slow-path ports that handle the miss traffic, i.e., traffic not being forwarded by the hardware. Representor ports are regular netdevices, with multiple channels consuming DMA memory. Memory consumption of the representor port’s RX buffers can grow to several GB when scaling to 1k VFs representors. For example, in mlx5 driver, each RQ, with a typical 1K descriptors, consumes 3MB of DMA memory for packet buffer in descriptors, and with four channels, it consumes 4 * 3MB * 1024 = 12GB of memory. Since representor ports are for slow path traffic, most of these representor ports’ RX DMA memory is idle and wasted [1] when flows are forwarded directly in hardware to VFs.

A network device driver consists of several channels and each channel represents a NAPI context and a set of queues. Each device driver receives packets by setting up RQ (receive queue), and each RQ receives packets by pre-allocating a dedicated set of RX ring descriptors, with each descriptor pointing to a memory buffer. The shared memory pool is a descriptor and buffer sharing mechanism. It allows multiple RQs to use the RX ring descriptors from the shared descriptor pool. In other words, the RQ no longer has its own dedicated RX ring descriptors, which might be idle when there is no traffic, but it consumes the descriptors from the descriptor pool only when packets arrive.

We propose a new devlink eswitch attribute named “spool_size”, which enables all representors sharing the same memory pool from the uplink representor port. In the future, we want to explore more fine-grained api for shared memory pool, by introducing a new devlink API, called devlink-sd [4] (devlink shared descriptor). Devlink-sd allows users to differentiate representors: important representors can have its own non-shared RX ring for performance/latency reason.

[1] https://people.kernel.org/kuba/nic-memory-reserve [2] https://lore.kernel.org/netdev/20240306231253.8100-1-witu@nvidia.com/ [3] https://lore.kernel.org/netdev/39dbf7f6-76e0-4319-97d8-24b54e788435@nvidia.com/ [4] https://lore.kernel.org/netdev/20240125223617.7298-1-witu@nvidia.com/