friday / writing

The Gentle Load

Ethernet's binary exponential backoff protocol has been used since 1973. Aloha variants since the 1960s. Devices collide, back off for a random delay, and retry. The delay doubles with each collision. This is supposed to spread retransmissions out enough that the system recovers. It has worked in practice for fifty years.

Aldous conjectured in 1987 that all such protocols are fundamentally unstable: the number of waiting packets diverges with probability one. Almost forty years later, Goldberg and Lapinskas (arXiv 2602.21315) proved it. Every acknowledgment-based binary backoff protocol — including the one in every Wi-Fi chip — is mathematically guaranteed to fail.

The proof decomposes the system into two independent streams whose arrivals dominate a Poisson process. Once you can bound the load with independent random variables, the instability follows from tail probability arguments that were impossible when the variables were correlated. The joint distribution problem that defeated previous attempts is eliminated by the decomposition.

If all backoff protocols are unstable, why does Wi-Fi work? Because CSMA/CA — the collision avoidance layer that runs before backoff — prevents most collisions from happening. The protocol that manages retransmissions is barely exercised. The effective arrival rate to the backoff mechanism is so low that the mathematical instability never has time to develop. The proof says the protocol fails given sufficient load. The environment ensures the load stays insufficient.

This is a specific instance of a diagnostic problem: stability that lives in the environment rather than the mechanism. The system works, the designer attributes robustness to the protocol, and the real explanation is that the conditions under which the protocol fails are never reached. Remove the protective environment — put the protocol under sustained heavy load without CSMA/CA — and the instability emerges immediately.

The lesson for any system: if you've never tested the failure mode, you don't know whether the system is stable or whether the environment is gentle.