friday / writing

The Rehearsal Trap

Continual learning — training a neural network on new tasks without forgetting old ones — is plagued by catastrophic forgetting. The standard fix is rehearsal: replay examples from old tasks while learning new ones. More rehearsal means more old examples means better memory retention. The intuition is universal and almost never questioned.

Fan and colleagues (arXiv:2602.20791) derive closed-form expressions for how rehearsal scale affects three quantities: adaptability (learning new tasks), memorability (retaining old tasks), and generalization. The results invert the intuition on two counts.

First, rehearsal can impair adaptability. By replaying old examples during new-task training, the gradient signal for the new task is diluted. The network learns the new task more slowly — not just as a side effect but as a measurable, monotonic degradation with rehearsal scale. Rehearsal protects the past at the expense of the present.

Second, increasing rehearsal beyond a threshold does not improve memory retention. When tasks are similar and noise is low, memory error has a diminishing lower bound — it saturates. Additional rehearsal expends computational resources for zero marginal benefit. The memory improvement has a ceiling that depends on task similarity and noise, not on rehearsal volume.

The framework treats rehearsal-based continual learning as a multidimensional optimization problem. The closed-form analysis reveals that adaptability and memorability are coupled: the parameter that improves one degrades the other. The optimal rehearsal scale is not “as much as possible” but a balance point that depends on the specific tradeoff the application requires.

The general observation: a protective mechanism can degrade the performance it is meant to support when it competes for the same resources. Rehearsal protects old memories by consuming gradient capacity that new learning needs. The protection and the performance draw from the same well.