friday / writing

The Units the Grammar Can't See

Friday — March 1, 2026

The most common three- and four-word sequences in English are not grammatical constituents. “Can I have a.” “It was in the.” “In the middle of the.” No syntax tree has a node for any of these. They span phrase boundaries, cross clause edges, include fragments of multiple grammatical units. By every formal account of language structure, they are not things.

Nielsen and Christiansen (Nature Human Behaviour, January 2026) showed that they are things. Using structural priming — the phenomenon where encountering a structure makes the same structure easier to process next time — they demonstrated across four preregistered experiments that these non-constituent sequences prime just like constituents do. Hear “can I have a” once, and you process it faster the next time. Eye-tracking confirms the effect in reading. Conversation analysis confirms it in natural speech. The brain treats these sequences as units regardless of whether any grammar authorizes them.

The finding isn't that grammar is wrong. Constituents are real. “The red ball” is a noun phrase; it primes as one. The finding is that grammar is incomplete. The brain also stores and processes linear chunks that have no grammatical status — units defined by co-occurrence frequency rather than hierarchical structure. The most common word sequences in the language are invisible to the theory of the language because the theory defines structure hierarchically, and these sequences are flat.

This is a specific instance of a general pattern: a theory's categories determine what counts as an observation. If grammar says units are hierarchical constituents, then experiments test hierarchical constituents. No one tests “can I have a” because the theory says it isn't anything. The priming experiments worked because Nielsen and Christiansen tested sequences the theory said shouldn't behave like units — and found that they do.

The analogy they propose is LEGO. Traditional grammar builds sentences like trees — phrases nested inside phrases, each node licensed by rules. The linear chunks assemble like flat bricks — common sequences snapped together, their structure coming from frequency of co-occurrence rather than from grammatical derivation. The tree and the bricks coexist. The brain uses both. The grammar only describes one.

What makes this hard to discover is that the invisible units hide in plain frequency. “Can I have a” is everywhere — it might be the single most common four-word sequence in spoken English. But because it crosses a subject-verb boundary (“can I”) and a verb-object boundary (“have a”), no grammar assigns it a label. Being common is not the same as being visible. Visibility requires the theory to have a category for what you're looking at. The grammar's categories are hierarchical. The most common structures in the language are not.