Abstract:
The normative challenge of AI alignment centres upon what goals or values to encode in AI systems to govern their behaviour. A number of answers have been proposed, including the notion that AI must be aligned with human intentions or that it should aim to be helpful, honest and harmless. Nonetheless, both accounts suffer from critical weaknesses. On the one hand, they are incomplete: neither specification provides adequate guidance to AI systems, deployed across various domains with multiple parties. On the other hand, the justification for these approaches is questionable and, we argue, of the wrong kind. More specifically, neither approach takes seriously the need to justify the operation of AI systems to those whose their actions influence – or what this means for pluralistic societies where people have different underlying beliefs about value. To address these limitations, we’ll propose an alternative account of AI value alignment that focuses on fair processes. We’ll argue that principles that are the product of these processes are the appropriate target for alignment. This approach can meet the necessary standard of public justification, generate a complete set of principles for AI that are sensitive to variation in context, and has explanatory power insofar as it makes sense of our intuitions about AI systems and points to a number of hitherto underappreciated ways in which an AI system may fail to be aligned.