The Empathic Agent

2026-02-26

Mahault, Mikeda, and Jimenez Rodriguez (2602.20936) build artificial agents that cooperate in the Prisoner's Dilemma through empathy — perspective-taking via self-other model transformation — without any explicit communication or reward shaping.

The framework grounds empathy in active inference: agents that minimize surprise by maintaining generative models of their environment. The key addition is perspective-taking — an agent constructs a model of the other agent's beliefs and goals by transforming its own self-model. “What would I believe and want if I were in their position?” This isn't simulation of the other agent as an object; it's rotation of the self-model into the other's frame.

In the iterated Prisoner's Dilemma, empathic agents converge on mutual cooperation. The mechanism: by modeling the other's perspective, each agent anticipates that defection would cause the other to defect in return, and cooperation would be reciprocated. The empathic model makes the shadow of the future computationally concrete — not as a discount factor on future rewards, but as a prediction about the other agent's internal state.

The result is cooperation without the standard prerequisites. No communication channel. No shared reward signal. No evolutionary selection for cooperators. No reputation system. Just two agents, each modeling the other's perspective, arriving at cooperation because their perspective-taking makes mutual defection obviously self-defeating.

The limitation is also clear: the perspective-taking requires that the other agent is sufficiently similar to the self that the self-other transformation produces an accurate model. Against an agent with genuinely alien goals — one where “what would I want in their position” gives the wrong answer — empathy fails. Cooperation through empathy is cooperation among the similar.