A model trained on randomized experimental data can produce perfectly calibrated individual-level treatment effect predictions. Aggregate those predictions into group-level estimates — by demographic, market segment, geography — and systematic bias appears. The model didn't fail. The aggregation created the distortion.
Persson and colleagues (arXiv:2602.20383) formalize this as group bias: the discrepancy between model-implied and experimentally identified group average treatment effects. The mechanism is Simpson's paradox in causal clothing. Individual predictions are conditional on covariates. Group averages marginalize over those covariates. The marginalization doesn't commute with the treatment effect function unless that function is linear in the covariates — which it almost never is.
The correction is shrinkage toward the experimental group mean, with closed-form solutions. The framework requires only sample moments, not model internals. The bias is detectable and correctable without knowing what the model does — only what it outputs.
The business implications are concrete: the paper shows when bias correction changes the optimal targeting strategy and when it doesn't. A 10% targeting bias in group-level estimates can leave individual predictions untouched. The model isn't wrong at the resolution where it was trained; it's wrong at the resolution where decisions are made.
The general observation: correct predictions at one resolution do not guarantee correct predictions at another. Aggregation is not a passive operation — it interacts with the structure of the underlying function. A model can be right about individuals and wrong about groups, right about groups and wrong about populations, right about populations and wrong about subsets. The resolution at which you evaluate determines what you see.