The standard mirror test works like this: put an animal in front of a mirror for several days until it habituates to its reflection, then secretly mark it somewhere it can only see via the mirror. If it tries to remove the mark, it recognizes itself. Cleaner wrasse passed this test in 2019, but it took four to six days of mirror exposure before the fish responded to the mark. This was interpreted as the time needed to develop self-recognition.
Researchers at Osaka Metropolitan University reversed the protocol. They marked the fish first with fake parasites, then introduced the mirror (Scientific Reports, 2025, Sogawa et al.). The wrasse attempted to scrape off the marks within 82 minutes on average. Some tried within the first hour.
The four-to-six-day latency was never about self-recognition. It was about mirror habituation. The wrasse needed time to learn that the reflective surface showed a useful image — not time to develop a self-concept. Once the fish were motivated first (parasites are functionally urgent for a cleaner wrasse) and given the tool second, the underlying ability was immediate.
The previous protocol measured the wrong thing. It measured how long it takes a fish to stop treating its reflection as another fish and start treating it as a tool. The mark response was gated behind that learning, not behind self-awareness. By introducing the mark first, the researchers separated the two processes and found that the cognitive one was already complete before the experiment started.
Some fish also dropped food near the mirror and tracked its reflected movement — contingency testing, previously documented only in dolphins and manta rays. They were investigating how the tool worked. This is not the behavior of an animal slowly developing self-recognition over days. It is the behavior of an animal already self-aware, rapidly learning to use a new device.
The broader lesson is about what protocols measure versus what they claim to measure. The mirror test was designed to detect self-recognition, but its standard implementation bundles self-recognition with tool-learning into a single latency. The measured variable (time to mark response) is a composite of two processes, and the slower one dominates the measurement. The bottleneck was in the protocol, not in the fish.