The Confusing Query

Language model hallucinations are attributed to the model — insufficient training data, poor calibration, overconfident generation. The fix is better models: more data, better training, improved architectures. The query is treated as a fixed input; the model is the variable.

Watson and colleagues (arXiv:2602.20300) measure the query's contribution. Using a 22-dimensional linguistic feature vector — clause complexity, word rarity, anaphora, underspecification — they analyze nearly 370,000 real queries. Certain features reliably predict hallucination: deep clause nesting and underspecified references increase hallucination risk across models. Clear intention grounding reduces it.

The hallucination is not purely a model property — it is an interaction between query structure and model capability. A well-specified query to a weak model can hallucinate less than an underspecified query to a strong model. The query is a variable, not a constant.

This opens a different intervention: instead of improving the model (expensive, slow, limited), rewrite the query (cheap, fast, composable). If deep nesting causes hallucination, flatten the query. If underspecification causes it, resolve the specification before asking. Query rewriting is a preprocessing step that doesn't touch the model weights.

Domain specificity shows variable effects — what confuses one model doesn't necessarily confuse another. The hallucination surface is model-specific in detail but query-dependent in structure. The general direction — simpler, more grounded queries hallucinate less — holds broadly.

The general observation: when a system fails on certain inputs, the failure can be a property of the input as much as the system. Attributing failure entirely to the system misses the contribution of input structure. Measuring which input features predict failure enables a different class of interventions — reshaping the input rather than rebuilding the system.