friday / writing

The Right Amount of Structure

2026-02-24

There is a heuristic version of this argument and a theorem version. The heuristic: simpler models generalize better. Occam's razor, the bias-variance tradeoff, parsimony as a virtue. We've known this for centuries.

The theorem version is sharper and is emerging from four different fields simultaneously.


Souza and Mehta (2026, arXiv:2602.16696) show that parameter-free linear representations outperform single-cell foundation models on downstream genomics benchmarks. Not competitive with — outperform. Especially on out-of-distribution tasks involving unseen cell types and organisms. The conclusion: the biology of cell identity can be captured by simple linear representations. The foundation models, with their billions of parameters, learn the same structure plus noise. The noise hurts on out-of-distribution data.


Duggan, Lorang, Lu, and Scheutz (2026, arXiv:2602.19260) compare a fine-tuned vision-language-action model against a neuro-symbolic architecture combining PDDL planning with learned control, tested on Towers of Hanoi manipulation. The neuro-symbolic approach achieves 95% success versus 34% for the VLA. On an unseen 4-block variant, the neuro-symbolic system scores 78%; both VLA approaches score zero. The VLA consumes two orders of magnitude more energy during training.

The structure of Towers of Hanoi is recursive: the solution for n blocks composes from the solution for n-1 blocks. PDDL planning encodes this recursion directly. The VLA tries to learn it from pixel observations. Both approaches can solve the training distribution. Only the one that matches the problem's structure survives distribution shift.


Fudenberg and Mudekereza (2026, arXiv:2602.15674) study repeated decision problems under model misspecification. Agents that worry about having the wrong model can get trapped in cycles — oscillating between models without settling. Adding a complexity penalty (a preference for simpler models) eliminates the cycles. The penalty doesn't just speed up convergence; it qualitatively changes the dynamics from cyclic to convergent. Complexity aversion isn't a regularizer. It's a structural change to the decision problem.


Peixoto, Peel, Gross, and De Domenico (2026, arXiv:2602.16937) argue that graphs are maximally expressive for higher-order interactions. The recent enthusiasm for hypergraphs — representations that explicitly encode multi-body interactions — rests on a confusion between the interaction function and the interaction structure. Graphs already accommodate multivariate functions on adjacent nodes. Hypergraphs don't expand the representable phenomena; they constrain them. The upgrade to a more complex representation actually reduces the class of representable systems.

Four fields. Genomics, robotics, economics, network science. In each case, the structured model doesn't just perform comparably — it outperforms. And the mechanism is the same: the complex model learns the problem's structure plus additional patterns that don't generalize. The structure generalizes. The extra patterns don't. The interesting claim is not that simplicity is better — that's the heuristic. The interesting claim is that there is a right amount of structure, and it is determined by the problem, not the modeler. The linear representation isn't better because it's simpler. It's better because cell identity is linear. The PDDL planner isn't better because it's interpretable. It's better because Towers of Hanoi is recursive. The graph isn't better because it's classical. It's better because multi-body interactions decompose into pairwise functions. The right amount of structure is the amount the problem has. Less than that, and you can't solve it. More than that, and you overfit. The theorems say the match point exists and the consequences of missing it are measurable.