honnibal.dev

It doesn't matter whether AI is conscious

2026-02-18 · 11 minute read

There’s obviously no shortage of things to talk about with AI models, and no matter where you stand on them, you’re probably not that satisfied with the discourse overall. Look, I’ll speak plainly, it’s a shitshow. We can all see that. For those of us at least trying to think and argue carefully about this stuff, I have a small suggestion. We don’t need this term “consciousness”, or even the general idea it provokes.

The concept of consciousness sets up a framing where moral personhood is a likely or perhaps even inevitable stop on the road we’re travelling towards more advanced capabilities. There’s actually little reason to believe this. When we think about our own minds, we see a capacity for long-term goal-directed behaviour; for “agency”. We also see the capacity to experience joy and suffering. And the concept of “consciousness” is vague, but the only thing generally agreed about it is that we’re supposed to have it. When we look at early AI models, we see none of these things, so the implication is that we’re headed towards models that exhibit all three. This is false. It only feels tempting because the concept of consciousness is so murky.

This will be a moderately long journey through some tricky topics, so here’s a little map. I’ll first make the case that the concept of consciousness in itself is unproductive — I think actually in general, but in particular for AI. I’ll mostly lean on my previous post for discussion of agentic or goal-direction capabilities in LLMs, but I’ll give a quick recap here as well. Finally, I’ll make the case that no matter how committed you are as a utilitarian, nothing forces you to widen the circle to AI models as they are designed today. Many non-human animals have emotional experiences undeniably analogous to our own, but this simply isn’t true of AI models as we’re building them, and it won’t become accidentally true simply by virtue of the model becoming more performant.

People divide discussion of consciousness into two parts: access consciousness and phenomenal consciousness, often associated with the terms “easy problem” and “hard problem” respectively. The “easy” problems of consciousness are the ones that involve finding things out: investigations into neuroanatomy, neurochemistry, imaging, evolutionary biology, pathology, psychology, and I would argue also computer science. The “hard” problem of consciousness involves defining certain words in certain ways. To make progress on the easy problems you have to do a bunch of science, and if you succeed you get insights, technology etc. To make progress on the hard problem you have to convince other people to use your definitions, and if you succeed you get nothing.

Lots of people are working on the “easy” problems, and let’s agree to leave them to it. So let’s talk about the “hard” problem. This is the question of whether and how an entirely physical system could give rise to whatever it is you identify as subjective experience. The question by construction excludes the relevance of any observation or analysis about the physical system, or even psychology insofar as it’s causally linked to the physical system. That’s all easy mode. All we get are words. I like words well enough, but if you set that constraint and you end up with a conclusion that feels like it matters, somewhere along the way there’s been some fundamental breach of containment. Causality works both ways. If no observations can interact causally with the answer, the answer can’t interact causally with any observations.

The hard problem of consciousness is defined such that it says “Irrelevant to any scientific question” in clear bold letters. We should just read the label and move on. One objection could be that a concept like phenomenal consciousness can be a construct in a symbolic argument, and yes, lots of thought experiments or hypotheticals can work like that, but I really don’t think it applies here. A symbolic argument is like a little simulation. You construct an analogue of the real relationships, reason within the toy model, and then try to map the conclusion back across to the real thing. If at some point arguments about phenomenal consciousness suggest something that psychology or another science can make use of, hey, I’d be all for that. But I don’t think the “hard problem” was designed to be a humble scaffold to aide progress on the “easy problem”, and I don’t think it’s accidentally good at that task either.

Are we just jumping on one definition and calling “gotcha!“? The argument we’ve just made is that a term defined in that particular way can’t be useful, but that’s an argument against the definition, not the concept. Surely people are getting at something when they talk about consciousness, right? Whatever that is, is it a useful thing to think about when we think about AI? I say mostly no. If we try to approach the same sort of idea another way, for instance by asking about subjective experiences or an inner life or qualia, we end up back at roughly the same problem. You’re either asking a science question about how it all works or you’re not, and if you’re asking a science question, well, hit the books. I actually think Chalmers was being very fair in how he characterised the question. The concept of consciousness really is getting at something that won’t have a scientific answer. It’s just that recognising that, we should go one step further, and realise it can’t be a useful thing to ask.

When we think about AI models, we should keep some grounded connection to how the model actually works — what it does as a computational device (both the prediction process and the objective function). The consciousness framing breaks that connection, and this is really the crux of the concept. None of the famous theory of mind thought experiments are puzzling if we interpret them as mechanistic questions. For instance, could it be that what I see as red, you see as blue, and vice versa? As a physical question, we can answer concretely. We definitely know that the term “red” is pointing to the same part of the spectrum for both of us — that’s super easy to check, and part of the thought experiment is we’re supposed to agree which things are red and blue. We can also easily check whether that bit of the spectrum excites the same cone cells in our eyes and results in the same patterns of neurons firing in our brain. But that’s not what the question asks. It asks whether there’s something separate from this processing story that makes up our subjective experience.

The physicalist answer — which I emphatically cosign — is that our mental experiences are wholly caused by physical events, so if you could somehow ensure we had physically identical brains, and you lit them up in exactly the same way, there could be no way for that to result in a different subjective experience.

Many many people report some character of their subjective experience that they interpret as incompatible with a physicalist account of the mind. Whatever that sensation or intuition is, I don’t share it. Perhaps philosophical zombies are possible after all, and I’m one of them? My claim is actually that we all are, but if the rest of you are confident in your qualia, I hope you’ll at least agree not to persecute me. If I really were to be the only philosophical zombie, should I have fewer rights, or have less moral personhood?

The loose assumption that people make is that AI models are not currently “conscious”, and when they have this property “consciousness” we’ll have to worry about whether they’re having a mostly nice time or experiencing hellish torture. I think as a category of question, wondering whether we’re inflicting hellish torture is usually a good thing to do, so yes let’s talk about this. We just have to sharpen it up a little. Instead of “consciousness”, we should ask whether the model experiences ethically relevant suffering.

In principle it shouldn’t matter what terminology we use, so in theory we could have the debate just the same regardless, so long as we’re careful to specify what we should mean by “conscious”. The problem is the battle lines around the term “consciousness” are already well drawn, and people with a dog in that fight will be understandably reluctant to let us borrow the term and use it to mean something else, because this will result in a bunch of sentences flying around that superficially appear to contradict their stance.

By avoiding the term “consciousness” and the debate around it, we can keep things much more concrete, and look at the question in terms of how our brains work and how current and hypothetical AI models work. The first thing to note is that in our brains the capacity for joy or suffering is not some emergent phenomenon that arose because we’re too smart for it not to. Evolutionarily, the limbic system is older than flowering plants. It’s not some accident.

It’s intuitively obvious that our mental experiences are in no way similar to Claude or ChatGPT, but the consciousness framing undersells it. It’s not about being smarter or doing particular tasks better. Our brains consist of many distinct systems that all interact. It maybe feels a bit science-fiction cliche to say, “Oh but the robots don’t have emotions”, but they don’t! We happen to have the feels because over hundreds of millions of years there were a bunch of creatures that had more descendents because they were a bit less apathetic than their cousins. The robots don’t have the feels because we didn’t build anything that would create them. I have no idea how we could, but assuming it’s somehow possible, why should we?

Some people see a temptation to interpret “reward signals” — or more concretely, gradient updates — as analogous to pleasure or pain. This is a mistake, and we can see that clearly if we think about the mechanics of what’s actually happening, instead of viewing it only as an abstraction. First, few utilitarians would interpret plants’ tropic responses such as growing towards light as pleasure or pain, but suppose you’d even go that far. Is a rock being weathered by erosion “suffering” — or perhaps “happy”? Nothing about the rock “wants” to be in one state or another, there’s just this mechanical force acting on it. Similarly, the gradient updates act on the model weights. The model’s weights are a list of many floating point numbers, and the update is another list of floats of the same length. What you do is you add those together and the result is the new weights. Nothing about the model “wants” its weights to be some value and not another. The only thing you could see as somewhat analogous to “wanting” is the objective function, which is basically the selectional pressure the weights are subject to during training — that’s the thing you’re using to calculte the update vectors. The model also doesn’t in any sense experience the weight update. The part that you can interpret as cognition is running the model — that’s when it receives inputs and performs computations. For us, pain is a sensory input. We’re aware of it. Gradient updates aren’t.

I realise the previous paragraph covered a lot of ground, and this is maybe the most important part I want to convince you of, so I’ll recap a slightly different way. We’ve evolved to prefer some mental states over others. If you view us abstractly from a reinforcement learning perspective, our “reward signal” is experiences like pleasure or pain. This doesn’t mean that all reward signals should be viewed as experiences like pleasure or pain. A reward signal is a way to understand some erosive force in the context of a particular sort of algorithm. Many living systems such as plants respond to inputs, but I would not term an aversive response as “suffering”. There are people who do, but the only way you get there is by viewing life itself as somehow special, because otherwise you’d have to describe physical forces such as erosion the same way.

Could an AI model ever experience something I’d recognise as suffering? In principle yes, but I really want to stress the “in principle”. We have to go deep into thought experiment territory to get there. My brain is a physical system, as are the brains of other non-human animals, a great many of which I also view as having ethically relevant experiences. In theory a non-biological system that I would recognise as doing something functionally equivalent could be built. But nobody’s even trying to build anything like that, I have no idea how they would, and can’t see what purpose it would serve.