For seventy years, linguistic theory has held that sentences are built from constituents — hierarchically organized units that nest inside each other like branches of a tree. “The big dog” is a constituent. “Dog chased the” is not. Grammar describes the rules by which constituents combine. This framework generates correct parse trees for well-formed sentences and has been the foundation of syntactic theory since Chomsky's 1957 formalization.
Morten Christiansen and Yngwie Nielsen, publishing in Nature Human Behaviour in January 2026, examined what sequences people actually produce and process most frequently. The answer: non-constituents. Sequences like “can I have a” and “it was in the” — fragments that cross constituent boundaries, that no grammar tree would group together — are the most common three- and four-word sequences in natural conversation. Using structural priming experiments, eye-tracking data, and phone conversation analysis, they showed that these sequences exist in mental representation. People process them faster when primed, meaning the brain stores and accesses them as units despite the theory saying they cannot be units.
The finding doesn't refute hierarchical syntax as a possible mode of sentence construction. It reveals that the framework, by defining its fundamental unit, simultaneously defined what could not be a unit — and the excluded category turned out to contain the most frequent structures. The theory works best for the structures people produce least often (grammatically complete, well-nested phrases) and works worst for the structures people produce most often (chunked, boundary-crossing sequences).
This is not a case of a theory being wrong about a marginal phenomenon. The theory's unit of analysis was defined in a way that excluded the system's most common output. The framework's descriptive power was inversely correlated with the frequency of what it was describing. The exceptional cases — the ones that parse cleanly into constituent trees — were treated as typical, and the typical cases — the ones that don't — were invisible. The grammar worked. The language didn't match.