Illusions of Understanding in Deep Learning

Date: November 15, 2024 (Friday)

Speaker: Dr Raphael Milliere, Macquarie University in Sydney

Chair: Dr Frank Hong, The University of Hong Kong

Abstract:

Recent advancements in artificial intelligence have been largely driven by deep learning. However, deep neural networks (DNNs) are often characterized as inscrutable “black boxes”: while we can study their performance on various tasks, we struggle to understand the internal mechanisms that drive it. Mechanistic interpretability has emerged as a promising approach to unveil the inner workings of DNNs by decoding the computations and representations underlying their behavior. While preliminary results in toy models show potential, scaling these techniques to large-scale DNNs remains a challenge. Here, I investigate a serious concern about the viability of this project: the possibility of illusory explanations that appear to reveal how DNN process information but are, in fact, misleading. I present a novel typology of such interpretability illusions, and explore potential strategies to mitigate their occurrence and impact on explanations.