Learning theory draws a sharp line between learnable and unlearnable. In the classical PAC setting, the VC dimension characterizes what can be learned from independent data. In the online setting, Littlestone's dimension characterizes what can be learned from adversarial sequences. Between these extremes — adversaries that are adaptive but constrained — the characterization has been missing.
Aden-Ali and colleagues (arXiv:2602.20585) fill the gap. They show that a distribution family admits VC-dimension-dependent regret bounds for every finite-VC hypothesis class if and only if it is generalized smooth. The condition is exact: generalized smoothness is both necessary and sufficient for efficient learning under distributional adversaries.
Generalized smoothness captures how much an adversary can shift the data distribution from one round to the next. If the shifts are bounded in the right sense — smooth enough that the learner's past observations remain informative about future data — then learning succeeds with the same sample complexity as the independent case. If the shifts are too abrupt, learning fails regardless of the algorithm.
The characterization extends to differential privacy: generalized smoothness also determines when private learning is possible under distributional constraints. The same condition that makes sequential learning work makes private learning work. The two problems, which seem structurally different — one about temporal adversaries, the other about information leakage — share the same learnability boundary.
The authors provide universal algorithms that achieve low regret without knowing the specific distribution family, only that it is generalized smooth. When the family is known, refined bounds use a new parameter called the fragmentation number.
The general observation: when two seemingly different learning problems (online and private) share the same learnability condition, the condition is not an artifact of the specific problem but a property of the information geometry. The boundary between learnable and unlearnable is determined by how much the data source can shift — and that shift is the same bottleneck whether the constraint is temporal or informational.