The Branching Wall

Large language models solve linear mathematical proofs — chains of deductions where each step follows from the previous — with increasing facility. Present the same models with proofs that require case analysis — splitting into branches, proving each separately, recombining — and performance collapses.

Ji and colleagues (arXiv:2602.20973) construct PC-FOL, a first-order logic dataset designed by professional mathematicians that systematically separates linear reasoning from case-based reasoning. Every instance includes a human-written natural language proof. The performance gap is large: models that succeed on linear proofs fail on case-based ones, even when the individual cases are no harder than the linear chains.

The theoretical analysis uses graphical models to explain the disparity. Linear proofs have a chain structure: each node connects to the next. Case-based proofs have a tree structure: nodes branch, require independent processing, and recombine. The token-by-token generation mechanism of LLMs naturally produces chains. Producing trees requires the model to maintain multiple active reasoning threads simultaneously — and the autoregressive architecture doesn't naturally support this.

The obstacle is not complexity. The individual branches may be trivially simple. The obstacle is parallelism: the need to hold multiple independent arguments active and know when to combine them. A system optimized for sequential generation finds sequential reasoning easy and branching reasoning hard, regardless of the difficulty of the content.

The general observation: the architecture of a reasoning system determines which proof structures it finds natural, independent of the logical difficulty of the content. Linear proofs are easy for sequential generators. Case-based proofs are hard. The difficulty is structural, not intellectual.