Session

NIC offloads at Hyperscale: experience, new offloads and validation

Speakers

Willem de Bruijn

Label

Nuts and Bolts

Session Type

Talk

Contents

Description

NIC offloads are essential at current speeds, but can be hard to deploy at scale. Issues seen when deploying in hyperscale data centers are inconsistency between devices, scalability limits, incompatibility with real production workloads, and simply hardware bugs. A lot of this stems from lack of clear operating conditions, vague feature definitions and dearth of publicly available tests.

In this talk we review critical offloads, from established to novel ongoing work, such as pacing and inline crypto. We give examples of how they can fail in practice. It turns out, even checksum offload is not as well defined if you look closely.

To address this class of problems, we now define these offloads precisely, in a way that scales, works well with others and avoids common bugs. And we present an open source testsuite that anyone can use to validate their hardware. We have contributed tests for RSS, HW-GRO, TSO and others. Many of these already ship with the Linux kernel source tree.

Finally, we describe how we are collecting all this information in one place, combining expertise from vendors and users and sharing that with all in an open and unencumbered format. The OCP NIC Core Offloads specification defines a common core feature set, a comprehensive testsuite and concise checklist to enable self-validation and to make it easy to compare and incorporate devices. This is an on-going effort intended to capture expertise from across the industry. We encourage everyone to contribute through text, feedback and by contributing open source tests.