Abstract:
Central to the inquiry of cognitive science is the idea of agents for whom there is the possibility of both success and failure. If we think that LLMs are the sort of thing that fall within the domain of cognitive science, then we should also ask whether they are agents of this kind – capable of making decisions and evaluating the world and their own actions as producing either success or failure.
Interpretability researchers tend to focus on easier-to-answer questions such as: what computations do they perform? Can we decode linguistic information from their internal activations? But success and failure supply the minimal normativity needed to get a naturalistic theory of meaning off the ground. So if we want to know whether LLMs mean the words they produce, or “understand” the inputs they receive, we should be asking whether they care about the consequences of those inputs and outputs. In slogan form: meaning requires the possibility of things mattering. And if the internal states of LLMs are to play some kind of cognitive role, it has to be because the LLMs themselves are agents with interests in their own right.