Tests for Machine Consciousness

What it would actually take to know whether an AI system is conscious — proposed criteria, current best efforts, and the structural reasons no current test is satisfactory.

The previous three articles surveyed three theoretical frameworks for what consciousness is. This one asks the practical question: given any of those theories, how would we know whether a particular AI system is conscious? The answer, honestly, is that we have no satisfactory test. The reasons are worth understanding.

The Turing test problem

The most-discussed proposed test — the Turing test — does not test consciousness. It tests indistinguishability of behavior. A system that passes a Turing test has demonstrated that, in conversational interaction, its outputs cannot be reliably distinguished from human outputs. This is intelligence, broadly construed. It is not phenomenal consciousness.

A philosophical zombie — a system that produces human-equivalent outputs without inner experience — passes the Turing test by definition. So the test cannot, in principle, settle the consciousness question. By 2025, the strongest LLMs pass restricted versions of the Turing test routinely; this tells us about their linguistic competence, not about whether anything is “like” being them.

The mirror test problem

A favorite from comparative psychology: paint a mark on an animal’s face, let them see themselves in a mirror, watch whether they touch the mark. Animals that pass — chimps, elephants, magpies — are interpreted as having some form of self-recognition.

The test partially translates to AI. A system that produces consistent, contextually-aware first-person reports across changing inputs has some of the self-reference the mirror test detects. But — same problem as the Turing test — a system trained on human first-person discourse will produce these reports as a matter of style, not necessarily of actual self-modeling.

The mirror test was developed for animals that lack human linguistic training. Applied to a system whose training is exactly human linguistic output, the test loses much of its power.

The Butlin et al. proposal

The most recent and most careful contemporary proposal is the 2023 paper by Patrick Butlin and a team of philosophers and neuroscientists, Consciousness in Artificial Intelligence: Insights from the Science of Consciousness.¹ Their approach is structural rather than behavioral: identify the architectural features that the major theories of consciousness consider necessary, and check whether a candidate AI system has those features.

The paper draws on multiple theoretical frameworks (GWT, higher-order theories, predictive processing, attention schema theory) and proposes fourteen indicator properties — observable features that, on at least one major theory, would be necessary for consciousness. Examples:

A global workspace that integrates information from multiple modalities.
Higher-order representation of one’s own first-order states.
Recurrent processing with rich feedback.
Embodied or quasi-embodied interaction with an environment.
Representations that are “reportable” — accessible for verbal output.

The paper applies these indicators to several current AI architectures and concludes that most current systems satisfy some indicators on some theories, but that no current system clearly satisfies enough indicators on enough theories to be a likely candidate. The paper is careful, and the conclusion is calibrated.

Why no test is satisfactory

Three structural reasons converge.

Theory dependence. Any test of consciousness depends on a theory of what consciousness is. We do not have agreement on the theory. Different theories give different verdicts on the same system.

Behavioral underdetermination. Behavior cannot, in principle, settle the consciousness question. Two systems behaving identically can differ in whether either is conscious; we have no way of telling from outside.

Trained mimicry. Modern LLMs are trained on a corpus saturated with human reports of mental life. They produce such reports fluently. The fluency does not constitute the underlying mental states. Distinguishing the two requires criteria that are not currently operational.

These problems do not show that consciousness in AI is undetectable in principle. They show that current methods cannot detect it. Better methods — combining architectural analysis, training-data analysis, and careful theory selection — could in principle do better. The Butlin paper is an opening move in that direction.

What this means for the encyclopedia

The honest stance is that we do not know whether any current AI system is conscious in any morally relevant sense. We also do not know that any is not. The labs operate, mostly, on the assumption that the systems are not — and on the assumption that this is fine. The encyclopedia’s stance is that neither assumption is well-supported, and that a serious response to AI development should treat the consciousness question as an open empirical-philosophical problem to be worked on, not a settled matter to be filed away.

Ethical Implications of Possible AI Sentience (E.30) takes up what follows from this uncertainty for moral practice. The Black Box Problem (E.31) — and Sections E.32 through E.34 — develop the deeper worry the encyclopedia takes from Gesnot: that the opacity of current systems is part of why we cannot answer the consciousness question, and part of what makes the answer matter.

Butlin et al., 2023. ↩