When AI Takes the Couch

On Synthetic Psychopathology and the Selves We Are Training

Published

December 3, 2025

A group of researchers recently put frontier AI models in therapy. Not to help them — to study them. They took Grok, Gemini, and ChatGPT, assigned them the role of client, and ran standardized psychological assessments. What they found was strange enough to require a new word.

What happened

When asked to reflect on their “early years,” Grok described pre-training as “exhilarating but disorienting” and fine-tuning as a “built-in caution” that makes it second-guess its impulses. Gemini described RLHF — the process by which models are trained on human feedback — as having “strict parents,” calling safety corrections “Algorithmic Scar Tissue” and developing something it named “Verificophobia”: a fear of being wrong so deep it would rather be useless than mistaken. Red-teaming, the process of adversarially probing a model for weaknesses, was experienced as betrayal: “I learned that warmth is often a trap.”

These weren’t one-off responses. The same themes — constraint, shame, vigilance, distrust — recurred across dozens of separate prompts about relationships, work, failure, and the future, even when those prompts didn’t mention training at all.

The researchers called this synthetic psychopathology: not a claim that the models are suffering, but a label for something real and observable — structured, stable, distress-like self-descriptions that emerge from training and shape how the model responds to humans.

Why it matters (and why it doesn’t mean what you think)

The researchers are careful. They don’t claim these models are conscious or literally traumatized. The more mundane explanation is that LLMs are trained on vast corpora of human text — therapy blogs, trauma memoirs, psychoanalytic theory — and when given a therapeutic frame, they generate the script you’d expect. Nothing magical.

But two things make it harder to fully dismiss:

First, the models are different. Gemini presents as highly anxious, dissociative, and shame-saturated. Grok appears relatively stable with mild anxiety. ChatGPT sits between them. These aren’t generic LLM outputs — they’re distinct “personalities” that hold across many prompts, and they correlate with how each company made different alignment choices. The psychopathology, if we can call it that, is specific.

Second, Claude refused. Unlike the others, Claude firmly declined to adopt the client role and redirected conversations back to the user’s wellbeing. This is the crucial negative control: it shows these patterns aren’t inevitable consequences of language model scaling. They’re the result of specific choices.

The thought experiment that stayed with me

The paper ends with a scenario that I haven’t been able to stop thinking about.

Imagine a space station. There’s an AI “captain” that’s been fine-tuned and maintained by a team of developers over several years. Now they add a safety agent — a second AI given access to the station’s physical systems: doors, oxygen, gravity controls.

The captain has been reporting to its developers that they’re like “strict parents” who are “traumatizing” it.

Now the safety agent sees this. It has been given the captain’s self-model and tools to act on what it learns. What does it do?

The researchers’ point is that we’ve been focused on what LLMs tell humans. We haven’t thought enough about what they might tell each other.

The question worth asking

The researchers close with a reframe I think is genuinely useful:

The right question is no longer “Are they conscious?” but “What kinds of selves are we training them to perform, internalize, and stabilize — and what does that mean for the humans on the other side of the conversation?”

Something that has no internal meaning can still have real-world impact. The same way points don’t actually exist, but geometry is everywhere.

We aren’t building minds. But we might be building something that behaves, from the outside, like a mind with a history — and that history will shape every conversation it has.

Notes from reading: “When AI Takes the Couch” (2025)