Session

Promise Networks: Why a Bilateral Link Layer Solves Congestion Control at the Source

Speakers

Anjali Singhai-Jain
Chihjen Chang
Paul Borrill
David Zage

Label

Moonshot

Session Type

Talk

Description

We are going to walk Linux Networking Humans through how to build Compassionate Networks that work for AI use cases, using Human Networks as examples to draw parallels. Our observation is that Linux Networking in general, with switches in between, is not very compassionate: it floods the network with packets, without any network resource reservations. Based on our experiments we believe there are fundamental flaws and semantic errors in the way we think about communication. Network packet drops, it turns out, are the mother of TAR (Timeout And Retry) and cause timeout storms, reconstruction storms, and metastable datacenters. So we go back to the fundamentals and ask: are packet drops necessary? As we scale the datacenter to millions of nodes, it turns out they are not – they cause congestion, transaction loss, and coherence problems in distributed systems and AI/ML infrastructures. Compassionate networks are the opposite of imposition networks, and most of our networks are imposition networks as of now. The opposite of an imposition network is a Promise Network.

The technical substance: modern congestion-control research treats the network as a substrate onto which packets are imposed and from which loss, latency, and ECN marks must be reactively interpreted. We argue this is the wrong stance, and that the right one was visible in the original 1976 Ethernet paper but never lifted from the transport layer to the link layer where it belongs. Metcalfe and Boggs’s end-dally construct in the EFTP file-completion protocol is the bilateral closure of a transmission: the receiver dallies awaiting the sender’s confirming echo, and only when both endpoints have observed the round-trip is the transaction complete. Open Atomic Ethernet (OAE) generalises this construct from a transport-layer completion to a link-layer admission primitive: every frame is gated on possession of a token issued by the peer, and a sender without a token cannot transmit. The consequence is structural rather than statistical: speculative or excess frames cannot get past the first link, so downstream switches and routers are never asked to absorb traffic they did not invite. Congestion control, in this stance, is not a feedback loop above the network but an admission discipline within it. We frame the argument in Mark Burgess’s Promise Theory – a Promise Network is one in which every link enforces bilateral promise binding and no link admits the imposition stance – and sketch what a Linux kernel surface would have to expose to deliver such a discipline to userspace. This is a moonshot proposal in the netdev sense, presented for community critique.