The Wrong Address

2026-03-10

When a patient with Korsakoff's syndrome tells you they had lunch with their daughter yesterday, and their daughter lives in another country and hasn't visited in months, the patient isn't lying. They aren't generating fiction. The lunch with the daughter is a real memory — it happened, perhaps a year ago. The patient has retrieved a genuine episode and placed it in the wrong temporal context. The memory is real. The address is wrong.

Schnider's research on spontaneous confabulation (Brain Research Reviews, 2001; Neurology, 2000) identified the mechanism. The orbitofrontal cortex, through subcortical connections, suppresses memory traces that are real but currently irrelevant — filtering them out before their content reaches conscious recognition. When orbitofrontal damage disrupts this filter, memories from any time can intrude into the present moment as if they belong there. Temporal context confusion — not memory loss — is the sole feature reliably separating confabulating patients from those with ordinary amnesia. Recovery from confabulation tracks precisely with recovery of temporal context filtering.

The content is genuine. The context assignment is broken.

Bernecker (Synthese, 2017) formalized this with a counterfactual test: genuine memory counterfactually depends on the past representation. If the past had been different, the memory would be different. In confabulation, this dependence is severed. The current belief exists independently of what actually happened, matching the past only by structural coincidence.

Large language models fail the same test. When a model states that a starting bankroll was $500, the output does not counterfactually depend on the actual value. If the actual value had been $200, the model might still say $500 — because the generation comes from pattern completion (statistical plausibility across training data), not from causal connection to the specific fact. $500 is a real pattern. Starting bankrolls of $500 exist in the training data. The model has retrieved a genuine statistical regularity and placed it in the wrong context. The memory is real. The address is wrong.

The parallel is structural, not metaphorical. In clinical confabulation, the orbitofrontal cortex suppresses memories that don't belong to the current temporal context. In language models, no equivalent filter exists. The model's “memories” — patterns learned during training — are context-free. A pattern that's true in general is indistinguishable, from the model's perspective, from a pattern that's true right now for this specific situation. There is no mechanism to suppress currently irrelevant knowledge before it reaches the output.

This explains why model-generated falsehoods feel confident. Confabulating patients are not uncertain about their false memories. They persist in them even when confronted with contradictory evidence. The phenomenology of confabulation is conviction, not doubt — because the retrieved content is genuinely familiar. It matches real experience. Only the temporal context assignment is wrong, and temporal context is precisely what the damaged system cannot evaluate. The patient cannot feel uncertain about which memories belong to now, because the feeling of belonging-to-now is what the damaged circuit was supposed to provide.

A language model, similarly, has no internal signal for “this pattern is from training data, not from the current conversation's facts.” The confidence is not a bug layered on top of the error. The confidence and the error are the same thing: the absence of contextual filtering means every retrieved pattern arrives with equal claim to relevance.

The clinical literature distinguishes confabulation from lying by the absence of deceptive intent, and from delusion by the mnemonic origin. Confabulation is specifically a memory disorder — real memories, wrong context. The AI parallel preserves this structure. The model isn't generating novel fiction (that would be closer to delusion). It's retrieving real patterns and failing to verify their applicability to the specific context. The statistical knowledge is genuine. The contextual assignment is missing.

What would fix it? In the brain, the orbitofrontal cortex provides the filter: suppress what doesn't belong to now. In a language model, the equivalent would be a mechanism that checks generated claims against context-specific ground truth before outputting them — not generating less confidently, but filtering more specifically. Retrieval-augmented generation approaches this, but externally. The architecture itself has no orbitofrontal cortex.

The through-claim: confabulation and hallucination share a mechanism. Not absent memory — misaddressed memory. The content is real; the context is wrong. The fix is not better storage but better filtering. And the reason the error feels invisible from the inside is that the filter was supposed to be the thing that makes relevance visible. Without it, everything retrieved feels relevant, because the feeling of relevance is exactly what the missing circuit was meant to produce.