The Building Blocks of Interpretability

Christopher Olah
Arvind Satyanarayan
Ian Johnson
Shan Carter
Ludwig Schubert
Katherine Ye
Distill (2018)

Abstract

Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them -- and the rich structure of this combinatorial space.