The Distribution Tax

Learning from diverse data sources should help — more variety, more information, better generalization. The intuition is that multiple distributions are multiple perspectives on the same problem, and aggregating perspectives should be at least as easy as learning from one.

Awasthi and colleagues (arXiv:2602.21039) show the opposite. Learning across k distributions inherently incurs sample complexity scaling with k/ε², even under constant noise levels. Standard single-distribution PAC learning achieves the faster rate of 1/ε. The multiplicity of distributions doesn't add information — it adds cost.

The mechanism: when learning from multiple distributions, the learner must simultaneously satisfy accuracy constraints on each one. Each distribution has its own Bayes error — the irreducible noise floor for that source. Competing with each source's optimal classifier requires distinguishing the learnable signal from the source-specific noise on every distribution simultaneously. This simultaneous requirement creates a statistical bottleneck that doesn't exist in single-distribution learning.

The penalty is multiplicative: k distributions cost k times as much, not logarithm-of-k or root-k. The diversity of sources is not a resource but a tax. Unless each distribution is learned separately — ignoring the others entirely — the multi-distribution learner pays a premium for the attempt at joint learning.

The result reveals a fundamental separation between random classification noise and Massart noise (bounded noise) in the multi-distribution setting. Under random noise, the slow rate is unavoidable. Under Massart noise, faster rates may be achievable. The type of noise matters more in the multi-distribution setting than in the single-distribution setting — a qualitative difference, not just a quantitative one.

The general observation: when a system must satisfy multiple constraints simultaneously, the cost can be multiplicative in the number of constraints, not sublinear. Diversity of requirements is a tax, not a subsidy. The learner that tries to please everyone pays more than the learner that focuses on one.