honnibal.dev

LLMs don't suffer

2026-02-27 · 8 minute read

There’s a common assumption that we somehow “can’t know” what LLMs might “experience”, and we should therefore be cautiously concerned about “model welfare”. The case for this is extremely weak when you think through how current models work and what it is we recognise as suffering in ourselves and non-human animals.

First, let’s look a little at the human experience and what we recognise as morally relevant. I’ll try as much as possible to steer clear of debates about “consciousness”, which get murky quickly. I want to stick to empirical stuff, so my understanding of it could be wrong or unnuanced — but I believe we’ve got a pretty good picture of how this works, especially with reference to people with various pathologies, who can give us good insight into how our physical processes influence our subjective experiences.

We have an integration between what I’ll loosely call “emotional circuitry” (roughly, the limbic system) and higher-order cognition (roughly, access consciousness, explicit awareness etc). Our emotional circuitry is crucial in preferring some experiences or mental states over others. Without this you could have the physical sensation of your hand burning without experiencing this as pain.

People differ in their opinions about how important they believe higher-order cognition should be in recognising sensory experiences as morally relevant. One position is that without some explicit “sense of self” there’s no observer to the pain, and therefore it’s not suffering. So we have something like the following hierarchy:

  • Sensation: Observation of what’s happening.
  • Pain: Sensation+emotion. The interpretation of what’s happening as good or bad.
  • Suffering: Sensation+emotion+cognition. Some sort of integration of the pain feedback into a self-referential conceptual model.

Now, there’s big debate about the pain-to-suffering step. I’m in the camp that would say that demanding a lot of higher-order cognition before we recognise pain as morally relevant suffering is, well, cope. I think it’s motivated reasoning that tries to explain why current agricultural practices are morally permissible, but in order to do that you end up with a position that says there’s nothing at all you could do to most animals that would be unethical. If you’re in a cabin alone in the woods and your cat is slightly annoying you, put it in a sack and throw it on the fire if you want, up to you. When I look at how non-human animals behave, I find it difficult to explain their behaviours without imagining them as having emotional lives and some sort of conceptual model. I also question how important cognition feels to me in the value I ascribe to certain experiences. States of panic don’t feel like they involve much cognition to me, but they definitely feel morally dispreferable.

Anyway. The point of this post isn’t some sort of pitch to get you onto team tofu. There’s a lot of moving parts in that argument. The point is just to ask yourself what features of an experience you’re thinking about when you think of it as suffering. There’s a lot of debate about the role of cognition, but what I don’t think there’s debate about is the need for some sort of emotional recognition.

Without the emotional component, the experiencer is indifferent to the experience. This emotional part hasn’t been centred in ethical debate before because it’s the part that’s undoubtedly shared between us and many non-human animals. The limbic system is older than flowering plants. But there’s nothing analogous to this in what AI models are doing, and we would have to go out of our way to build it.

Some people see a temptation to interpret “reward signals” — or more concretely, gradient updates — as analogous to pleasure or pain. This is a mistake, and we can see that clearly if we think about the mechanics of what’s actually happening, instead of viewing it only as an abstraction. First, few utilitarians would interpret plants’ tropic responses such as growing towards light as pleasure or pain, but suppose you’d even go that far. Is a rock being weathered by erosion “suffering” — or perhaps “happy”? Nothing about the rock “wants” to be in one state or another, there’s just this mechanical force acting on it. Similarly, the gradient updates act on the model weights. The model’s weights are a list of many floating point numbers, and the update is another list of floats of the same length. What you do is you add those together and the result is the new weights. Nothing about the model “wants” its weights to be some value and not another. The only thing you could see as somewhat analogous to “wanting” is the objective function, which is basically the selectional pressure the weights are subject to during training — that’s the thing you’re using to calculate the update vectors. The model also doesn’t in any sense experience the weight update. The part that you can interpret as cognition is running the model — that’s when it receives inputs and performs computations. For us, pain is a sensory input. We’re aware of it. Gradient updates aren’t.

There’s also nothing in the forward pass (the part you could analogously say is “cognition”) that corresponds to “wanting” in any way. There’s no sort of preference anywhere in the model for the computation to reach one sort of result or another. Why should we view this as more ethically salient than any other type of computation we could conduct?

Our brains consist of many distinct systems that all interact. It maybe feels a bit science-fiction cliche to say, “Oh but the robots don’t have emotions”, but they don’t! We happen to have the feels because over hundreds of millions of years there were a bunch of creatures that had more descendants because they were a bit less apathetic than their cousins. The robots don’t have the feels because we didn’t build anything that would create them.

Could something functionally similar emerge, even though we didn’t build it? In principle I agree that what matters is the computational role something plays, not the specific anatomy. But I do think it’s significant that we don’t see emotion implemented as some emergent property, smeared across our brains. Rather it’s localised to specific structures, and we understand the selectional pressures behind those structures. The function that emotion plays in our brain is tied to reward and our direct experience of it. We even have a second example to look at in cephalopods, which independently evolved different but recognisably analogous neuroanatomy for emotional processing. But what would it mean for something “functionally equivalent” to emotions to spontaneously arise in an LLM? There’s no emotion-shaped hole in the processing story. There’s no job that we’d speculate the weights to arrange themselves to do that would correspond to emotions, and there’s no mechanism by which they could do it.

There are limits to how abstract I’m willing to let the analogy be before it no longer holds moral weight for me. Let’s back all the way up and think about what we’re really doing when we pass moral judgments. To me a statement like “X is wrong” is an abridged conditional, roughly of the form “Given such and such moral assumptions, X is wrong”. So why do this at all? Well, call it a long-standing family tradition, older than our species. I think of this as the meta-ethics null hypothesis, which is roughly: “Nihilists aren’t wrong, they’re just assholes”. I could pretend to be indifferent to the welfare of others, but it would be just that — a pretense.

You could come to me reductively and say “look, you think this animal’s brain is just this pile of matter, being excited in particular ways by electrical impulses. Why should you be so pressed about which patterns of impulses happen? Why care?” And I freely admit that if you dig deep enough there will be nothing underneath: you’ll get to “yeah I just do”.

So now we have these LLMs, and they’re a pile of computations, and you come to me and you say: “Why should you be so pressed about what they compute? Why care?” And this time my answer is “…Actually I don’t”. You could hypothetically construct the computation very carefully so that I couldn’t help but see the same thing I do arbitrarily care about — the experiences of ourselves and certain non-human animals. Some sort of hypothetical whole-brain emulation would fit the bill for me. But you’d have to go far out of your way to create the resemblance. The capabilities alone aren’t enough.