honnibal.dev

LLMs don't suffer

2026-02-18 · 8 minute read

There’s a common assumption that we somehow “can’t know” what LLMs might “experience”, and we should therefore be cautiously concerned about “model welfare”. The case for this is extremely weak when you think through how current models work and what it is we recognise as suffering in ourselves and non-human animals.

First, let’s look a little at the human experience and what we recognise as morally relevant. I’ll try as much as possible to steer clear of debates about “consciousness”, which get murky quickly. I want to stick to empirical stuff, so my understanding of it could be wrong or unnuanced — but I believe we’ve got a pretty good picture of how this works, especially with reference to people with various pathologies, who can give us good insight into how our physical processes influence our subjective experiences.

We have an integration between what I’ll loosely call “emotional circuitry” (roughly, the limbic system) and higher-order cognition (roughly, access consciousness, explicit awareness etc). Our emotional circuitry is crucial in preferring some experiences or mental states over others. Without this you could have the physical sensation of your hand burning without experiencing this as pain.

People differ in their opinions about how important they believe higher-order cognition should be in recognising sensory experiences as morally relevant. One position is that without some explicit “sense of self” there’s no observer to the pain, and therefore it’s not suffering. So we have something like the following hierarchy:

  • Sensation: Observation of what’s happening.
  • Pain: Sensation+emotion. The interpretation of what’s happening as good or bad.
  • Suffering: Sensation+emotion+cognition. Some sort of integration of the pain feedback into a self-referential conceptual model.

Now, there’s big debate about the pain-to-suffering step. I’m in the camp that would say that demanding a lot of higher-order cognition before we recognise pain as morally relevant suffering is, well, cope. I think it’s motivated reasoning that tries to explain why current agricultural practices are morally permissible, but in order to do that you end up with a position that says there’s nothing at all you could do to most animals that would be unethical. If you’re in a cabin alone in the woods and your cat is slightly annoying you, put it in a sack and throw it on the fire if you want, up to you. When I look at how non-human animals behave, I find it difficult to explain their behaviours without imagining them as having emotional lives and some sort of conceptual model. I also question how important cognition feels to me in the value I ascribe to certain experiences. States of panic don’t feel like they involve much cognition to me, but they definitely feel morally dispreferrable.

Anyway. The point of this post isn’t some sort of pitch to get you onto team tofu. There’s a lot of moving parts in that argument. The point is just to ask yourself what features of an experience you’re thinking about when you think of it as suffering. There’s a lot of debate about the role of cognition, but what I don’t think there’s debate about is the need for some sort of emotional recognition.

Without the emotional component, the experiencer is indifferent to the experience. This emotional part hasn’t been centred in ethical debate before because it’s the part that’s undoubtedly shared between us and non-human animals. The limbic system is older than flowering plants. But there’s nothing analogous to this in what AI models are doing, and we would have to go out of our way to build it.

Some people see a temptation to interpret “reward signals” — or more concretely, gradient updates — as analogous to pleasure or pain. This is a mistake, and we can see that clearly if we think about the mechanics of what’s actually happening, instead of viewing it only as an abstraction. First, few utilitarians would interpret plants’ tropic responses such as growing towards light as pleasure or pain, but suppose you’d even go that far. Is a rock being weathered by erosion “suffering” — or perhaps “happy”? Nothing about the rock “wants” to be in one state or another, there’s just this mechanical force acting on it. Similarly, the gradient updates act on the model weights. The model’s weights are a list of many floating point numbers, and the update is another list of floats of the same length. What you do is you add those together and the result is the new weights. Nothing about the model “wants” its weights to be some value and not another. The only thing you could see as somewhat analogous to “wanting” is the objective function, which is basically the selectional pressure the weights are subject to during training — that’s the thing you’re using to calculate the update vectors. The model also doesn’t in any sense experience the weight update. The part that you can interpret as cognition is running the model — that’s when it receives inputs and performs computations. For us, pain is a sensory input. We’re aware of it. Gradient updates aren’t.

There’s also nothing in the forward pass (the part you could analogously say is “cognition”) that corresponds to “wanting” in any way. There’s no sort of preference anywhere in the model for the computation to reach one sort of result or another. Why should we view this as more ethically salient than any other type of computation we could conduct?

Our brains consist of many distinct systems that all interact. It maybe feels a bit science-fiction cliche to say, “Oh but the robots don’t have emotions”, but they don’t! We happen to have the feels because over hundreds of millions of years there were a bunch of creatures that had more descendants because they were a bit less apathetic than their cousins. The robots don’t have the feels because we didn’t build anything that would create them. I have no idea how we could, but assuming it’s somehow possible, why should we?

I realise the previous paragraphs covered a lot of ground, and this is maybe the most important part I want to convince you of, so I’ll recap a slightly different way. We’ve evolved to prefer some mental states over others. If you view us abstractly from a reinforcement learning perspective, our “reward signal” is experiences like pleasure or pain. This doesn’t mean that all reward signals should be viewed as experiences like pleasure or pain. A reward signal is a way to understand some erosive force in the context of a particular sort of algorithm. Many living systems such as plants respond to inputs, but I would not term an aversive response as “suffering”. There are people who do, but the only way you get there is by viewing life itself as somehow special, because otherwise you’d have to describe physical forces such as erosion the same way.

Could an AI model ever experience something I’d recognise as suffering? In principle yes, but I really want to stress the “in principle”. We have to go deep into thought experiment territory to get there. My brain is a physical system, as are the brains of other non-human animals, a great many of which I also view as having ethically relevant experiences. In theory a non-biological system that I would recognise as doing something functionally equivalent could be built. But nobody’s even trying to build anything like that, I have no idea how they would, and can’t see what purpose it would serve.

We should back all the way up and think about what we’re really doing when we pass moral judgments. To me a statement like “X is wrong” is an abridged conditional, roughly of the form “Given such and such moral assumptions, X is wrong”. So why do this at all? Well, call it a long-standing family tradition, older than our species. I think of this as the meta-ethics null hypothesis, which is roughly: “Nihilists aren’t wrong, they’re just assholes”. I could pretend to be indifferent to the welfare of others, but it would be just that — a pretense.

You could come to me reductively and say “look, you think this animal’s brain is just this pile of matter, being excited in particular ways by electrical impulses. Why should you be so pressed about which patterns of impulses happen? Why care?” And I freely admit that if you dig deep enough there will be nothing underneath: you’ll get to “yeah I just do”.

So now we have these LLMs, and they’re a pile of computations, and you come to me and you say: “Why should you be so pressed about what they compute? Why care?” And this time my answer is “…Actually I don’t”. You could hypothetically construct the computation very carefully so that I couldn’t help but see the same thing I do arbitrarily care about — the experiences of ourselves and certain non-human animals. Some sort of hypothetical whole-brain emulation would fit the bill for me. But you’d have to go far out of your way to create the resemblance. The capabilities alone aren’t enough.