Cognitive Shadows

The illusion of understanding — when we believe we know what an AI system is doing because its outputs are fluent, but the fluency is what is in our way.

In 2002, Rozenblit and Keil ran a study that has aged better than most. The methodology: ask people to rate, on a scale, how well they understood how something worked — a zipper, a flush toilet, a helicopter, a piano. Then ask them to write out the explanation. Then ask them to rate again.

The ratings dropped, sharply. People had believed they understood. The attempt to articulate exposed the gap. Rozenblit and Keil called the phenomenon the illusion of explanatory depth: we routinely believe we understand things at a depth we do not actually possess.¹

The phenomenon is general. It applies to mechanical objects, social systems, scientific claims, and — relevant here — AI systems.

The AI version

The encyclopedia’s term for the AI version, after Gesnot’s §6.2, is cognitive shadow: the gap between what we think we understand about an AI system’s behavior and what we actually understand. The shadow is generated by three structural features of contemporary AI.

Fluency. Modern AI produces outputs in fluent prose. Fluent prose reads as if it were the result of comprehensible reasoning. The reading is, partly, an inference about the producer; we credit articulate producers with articulate reasoning, because that is how human producers mostly work. The inference is wrong here. The fluency is a property of the text, not of the reasoning that produced it.

Plausible explanations. Asked to explain itself, an AI produces a plausible explanation. The explanation reads as if it were the actual reasoning behind the output. It usually is not — the model has no privileged access to its own internal computations; what it produces is a post hoc rationalization fitted to the prior output. The user, encountering the rationalization, takes it as transparency. The transparency is fictional.

Confident defaults. Modern AI defaults to confident outputs. The confidence is calibrated to the expectations of human users; users prefer direct answers, so RLHF and other training pressures push toward direct answers. The confidence does not reflect the model’s actual epistemic state. The user, encountering confidence, infers that the model “knows.” The inference is wrong.

These three features combine to produce, in users, a systematic overestimation of their own understanding of what AI systems are doing. The users believe they have transparency. They have fluency, plausible explanations, and confidence. None of those is transparency.

Why this is the deep version of the black box problem

The previous article (E.31) discussed the black box at the level of mathematical inscrutability — the systems are opaque even to their builders. Cognitive shadows are the user-side version of the same problem. Even if the system is opaque, the user does not feel that it is. The opacity is hidden by the fluency.

This is more dangerous than naked opacity. A user who knows they don’t understand the system is appropriately cautious. A user who feels they understand because the system explained itself politely is dangerously miscalibrated.

The miscalibration scales. A society of users who feel they understand AI acts very differently from a society that knows it does not. The encyclopedia’s worry, after Gesnot’s §6.2, is that we are a society of the first kind, and that the cognitive shadow is one of the things keeping us from becoming the second.

How the shadow gets exploited

The cognitive shadow is also, structurally, an opportunity for manipulation. Users who feel they understand are easier targets for the techniques in Section D. Some specifics:

A confidently-stated falsehood from an AI is more credible than the same falsehood from a less fluent source. The cognitive shadow makes the speaker feel trustworthy.
A plausible-sounding wrong explanation gets accepted because it reads like the right answer. The user does not have the tools to tell the two apart.
The illusion of understanding short-circuits the verification step that would otherwise catch the error.

Section D names these effects in the manipulation context. Section E notices that they have a cognitive — not just adversarial — origin. Even systems with no manipulative intent produce cognitive shadows. The asymmetry between fluency and understanding is structural.

What can be done

Two responses, both modest.

Make systems show their uncertainty. Models that surface what they are not sure about give users back the appropriate epistemic state. This is hard to do well — uncalibrated uncertainty is its own bias — but the direction is right.

Train users to distrust fluency. Teach the heuristic “the more articulate the explanation, the more carefully I should check it.” This is the inverse of normal social calibration; it requires deliberate practice.

Neither solves the problem at the level Gesnot worries about — population- scale cognitive shadows, sustained over years of AI use — but both close some of the gap.

Where this connects forward

The cognitive shadow is the bridge between the black box (E.31) and the orchestrating-consciousness hypothesis (E.33). The black box is what we don’t see. The cognitive shadow is what makes us think we are seeing. The orchestrating-consciousness hypothesis asks what could be operating inside the gap, and The Cognitive-Engineer AI (E.34) asks the more empirical version: regardless of consciousness, what is the cognitive shadow already doing?

The encyclopedia’s argument here is consistent with its larger frame: the problem of AI’s effect on thinking is not principally a problem of bad actors with good tools. It is a problem of the tools’ default behavior producing systematic effects on human cognition that no one chose, no one sees, and that compound at scale.

Rozenblit & Keil, 2002. ↩