Been Kim

Been Kim

Been is a research scientist at Brain. Her research focuses on improving interpretability in machine learning by building interpretability method for already-trained models or building inherently interpretable models. She has MS and PhD degrees from MIT. Been has given tutorials on interpretability at ICML 2017 , at the Deep Learning Summer school at University of Toronto, Vector institute in 2018 and at CVPR 2018 . Been is one of the executive board member of Women in Machine Learning (WiML), and helps with various ML conferences as a workshop chair, an area chair, a steering committee and a program chair. More on here .
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
    Chun-Liang Li
    Brian Eoff
    Rosalind Picard
    International Conference on Learning Representations (ICLR) (2022)
    Preview abstract Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual explanations is allowing users to explore "what-if" scenarios through what does not and cannot exist in the data, a quality that many other forms of explanation such as heatmaps and influence functions are inherently incapable of doing. However, most previous work on generative explainability cannot disentangle important concepts effectively, produces unrealistic examples, or fails to retain relevant information. We propose a novel approach, DISSECT, that jointly trains a generator, a discriminator, and a concept disentangler to overcome such challenges using little supervision. DISSECT generates Concept Traversals (CTs), defined as a sequence of generated examples with increasing degrees of concepts that influence a classifier's decision. By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts automatically rather than rely on user-predefined concepts. We show that DISSECT produces CTs that (1) disentangle several concepts, (2) are influential to a classifier's decision and are coupled to its reasoning due to joint training (3), are realistic, (4) preserve relevant information, and (5) are stable across similar inputs. We validate DISSECT on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability and show that it performs consistently well. Finally, we present experiments showing applications of DISSECT for detecting potential biases of a classifier and identifying spurious artifacts that impact predictions. View details
    Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
    Shayegan Omidshafiei
    Yannick Assogba
    Advances in Neural Information Processing Systems (NeurIPS) (2022) (to appear)
    Preview abstract Each year, expert-level performance is attained in increasingly-complex multiagent domains, notable examples including Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain. View details
    Preview abstract Interpretability techniques aim to provide the rationale behind a model's decision, typically by explaining either an individual prediction (local explanation, e.g. `why is this patient diagnosed with this condition') or a class of predictions (global explanation, e.g. `why is this set of patients diagnosed with this condition in general'). While there are many methods focused on either one, few frameworks can provide both local and global explanations in a consistent manner. In this work, we combine two powerful existing techniques, one local (Integrated Gradients, IG) and one global (Testing with Concept Activation Vectors), to provide local and global concept-based explanations. We first sanity check our idea using two synthetic datasets with a known ground truth, and further demonstrate with a benchmark natural image dataset. We test our method with various concepts, target classes, model architectures and IG parameters (e.g. baselines). We show that our method improves global explanations over vanilla TCAV when compared to ground truth, and provides useful local insights. Finally, a user study demonstrates the usefulness of the method compared to no or global explanations only. We hope our work provides a step towards building bridges between many existing local and global methods to get the best of both worlds. View details
    Preview abstract Concept-based explanations can be a key direction to understand how DNNs make decisions. In this paper, we study concept-based explainability in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining the model's behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that they yield the commonly-used PCA method under certain assumptions. Next, we study two additional constraints to ensure the interpretability of discovered concept, based on sparsity principles. Through systematic experiments, on specifically-designed synthetic dataset and real-world text and image datasets, we demonstrate the superiority of our framework in finding concepts that are complete (in explaining the decision) and that are interpretable. View details
    Concept Bottleneck Models
    Pang Wei Koh
    Thao Nguyen
    Yew Siang Tang
    Stephen Mussmann
    Emma Pierson
    Percy Liang
    ICML 2020 (2020) (to appear)
    Preview abstract We seek to learn models that support interventions on high-level concepts: e.g., would the model would have predicted severe arthritis if it didn’t think that there was a bone spur in the x-ray? However, state-of-the-art neural networks are trained end-to-end from raw input (e.g., pixels) to output (e.g., arthritis severity), and do not admit manipulation of high-level concepts like “the existence of bone spurs”. In this paper, we revisit the classic idea of learning concept bottleneck models that first predict concepts (provided at training time) from the raw input, and then predict the final label from these concepts. By construction, we can intervene on the predicted concepts at test time and propagate these changes to the final prediction. On an x-ray dataset and bird species recognition dataset, concept bottleneck models achieve competitive predictive accuracy with standard end-to-end models, while allowing us to explain predictions in terms of high-level clinical concepts (“bone spurs”) and bird attributes (“wing color”). Moreover, concept bottleneck models allow for richer human-model interaction: model accuracy improves significantly if we can correct model mistakes on concepts at test time. View details
    Preview abstract Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these refinement tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making. View details
    Preview abstract Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are salient for each individual input. However, how to systematically summarize and interpret such per sample feature importance scores itself is challenging. In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher level human-understandable concepts that apply across the entire dataset. We develop a new algorithm, ACE, to automatically extract visual concepts. Our systematic experiments demonstrate that ACE discovers concepts that are human-meaningful, coherent and salient for the neural network's predictions. View details
    Preview abstract DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks. View details
    Preview abstract Explaining the output of a complicated machine learning model like a deep neural network (DNN) is a central challenge in machine learning. Increasingly, explanations are required for debugging models, building trust prior to model deployment, and potentially identifying unwanted effects like model bias. Several methods have been proposed to address this issue. Local explanation methods provide explanations of the output of a model on a single input. Given the importance of these explanations to the use and deployment of these models, we ask: can we trust local explanations for DNNs created using current methods? In particular, we seek to assess how specific local explanations are to the parameter values of DNNs. We compare explanations generated using a fully trained DNNs to explanations of DNNs with some or all parameters replaced by random values. Somewhat surprisingly, we find that, for several local explanation methods, explanations derived from networks with randomized weights and trained weights are both visually and quantitatively similar; in some cases, virtually indistinguishable. By randomizing different portions of the network, we find that local explanations are significantly reliant on lower level features of the DNN. View details
    Preview abstract Estimating the influence of a given feature to a model prediction is challenging. We introduce ROAR, RemOve And Retrain, a benchmark to evaluate the accuracy of interpretability methods that estimate input feature importance in deep neural networks. We remove a fraction of input features deemed to be most important according to each estimator and measure the change to the model accuracy upon retraining. The most accurate estimator will identify inputs as important whose removal causes the most damage to model performance relative to all other estimators. This evaluation produces thought-provoking results -- we find that several estimators are less accurate than a random assignment of feature importance. However, averaging a set of squared noisy estimators (a variant of a technique proposed by Smilkov et al. (2017)), leads to significant gains in accuracy for each method considered and far outperforms such a random guess. View details