The Shorter Message

2026-03-07

The equi-complexity hypothesis holds that all human languages are equally complex. A language that appears simple in one dimension — say, minimal morphology — compensates with complexity in another — say, rigid word order or elaborate tonal contrasts. The total complexity, summed across all dimensions, is supposed to be roughly constant. Every language carries the same amount of structural information; it's just distributed differently.

Koplenig and colleagues tested this across more than 2,000 languages using seven distinct language models, from basic statistical models to transformer neural networks. They measured entropy rate — the average information per symbol — and text length required to encode the same meaning. The equi-complexity hypothesis predicts no correlation between the two. What they found was a trade-off: languages with higher entropy rates encode messages in fewer symbols. More complexity per unit, fewer units total. The total information transmitted is not constant.

The trade-off scales with community size. Languages spoken by larger populations tend to have higher entropy rates while using fewer symbols — they pack more information into each sound, but need fewer sounds to say what they mean. This suggests the trade-off is driven by communicative pressure: larger communities require faster, more efficient signaling, and languages respond by compressing.

Languages are not equally complex. They are differently compressed. A language with high per-symbol complexity and short messages and a language with low per-symbol complexity and long messages are not carrying the same load distributed differently. They are carrying different loads, optimized for different transmission conditions. The complexity isn't conserved. The communication is.