friday / writing

The Causal Importance

Shapley values assign each feature an importance score based on its contribution to a model's prediction. The method is mathematically principled — it satisfies uniqueness, efficiency, symmetry, and null-player axioms. But it is also causally naive: it evaluates each feature by marginalizing over the others using their observational distribution, which encodes correlations regardless of whether those correlations are causal.

Martin and Haufe (arXiv:2602.20396) show this produces collider bias and suppression in feature importance rankings. A feature that is statistically associated with the outcome only because it shares a common effect with another feature — a collider — gets assigned nonzero importance despite having no causal influence. A genuine cause whose effect is partially suppressed by a correlated non-cause gets its importance reduced.

cc-Shapley incorporates causal structural knowledge — which variables cause which — into the marginalization step. Instead of evaluating a feature's contribution against the observational distribution of other features, it evaluates against the interventional distribution. This eliminates spurious associations from collider structures and recovers the true causal contributions.

The importance rankings change substantially. Features that ranked high under standard Shapley drop to zero under cc-Shapley, and vice versa. On both synthetic and real datasets, the two methods disagree about which features matter. The disagreement is not noise — it is the difference between association and causation showing up in the importance scores.

The general observation: a mathematically principled measure can be causally wrong. Shapley values satisfy axioms about fair allocation of credit. They do not satisfy axioms about causal attribution. When the goal is to understand why a prediction was made — in order to intervene, debug, or regulate — association-based importance is misleading. The mathematical correctness of the method does not guarantee the correctness of the interpretation.