Date of Award
Fall 2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Linguistics
First Advisor
Frank, Robert
Abstract
Attribution methods (Lipovetsky and Conklin, 2001; Štrumbelj et al., 2009; Simonyan et al., 2014; Zeiler and Fergus, 2014; Bach et al., 2015; Ribeiro et al., 2016; Shrikumar et al., 2017a; Murdoch et al., 2018; Sundararajan et al., 2017; Sundararajan and Najmi, 2020, inter alia) are a family of local interpretability techniques that measure the “contribution” of input features towards an individual model output. In natural language processing (NLP), attribution methods are often used to identify input tokens (e.g., Li et al., 2016, 2017; Arras et al., 2017b; Jumelet et al., 2019) or neural network units (e.g., Lakretz et al., 2019; Serrano and Smith, 2019) that strongly impact the overall behavior of a model. This dissertation takes steps towards developing a conceptual framework designed to guide the development and evaluation of attribution methods, with particular focus on NLP applications. We begin with an intrinsic evaluation of five attribution methods, which shows that the notion of “contribution” formalized by attribution methods does not match our intuitive understanding thereof. We argue that these results are due to an incongruence between the theories of causation that underlie the design of attribution methods and the vaguely-defined goals of explanation against which attribution methods are evaluated. We then explore two applications of attribution methods: one that seeks to explain the behavior of an LSTM language model and one that uses measurements of causation in a downstream task. We conclude with a reflection of the conceptual structures and evaluation criteria imposed on attribution methods by these two applications, and propose an application-oriented program for research in attribution.
Recommended Citation
Hao, Yiding, "Theory and Applications of Attribution for Interpretable Language Technology" (2022). Yale Graduate School of Arts and Sciences Dissertations. 839.
https://elischolar.library.yale.edu/gsas_dissertations/839