Post Hoc Explanations of Language Models Can Improve Language Models

Satyapriya Krishna
Jiaqi Ma
Dylan Z Slack
Sameer Singh
Himabindu Lakkaraju
NeurIPS 2023 (2023)

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in performing complex tasks, excelling at in-context learning, and providing step-by-step reasoning. However, incorporating human-annotated rationales such as Chain-of-Thoughts for enhancing model performance faces challenges in scalability and can sometimes adversely affect performance. In this work, we present a novel approach, AMPLIFY: Advanced Model Performance Leveraging In-Context Learning with Post Hoc Explanations, which addresses these challenges by replacing human-annotated rationales with automatically generated rationales using post hoc explanation methods. Post hoc explanation techniques have gained popularity for determining attribution scores for input features in model predictions, deepening our understanding of model behavior and helping pinpoint errors in complex models. We leverage these explanations to provide corrective signals to large language models, reducing prediction errors and augmenting in-context learning with automatically generated rationales. Our findings demonstrate that AMPLIFY leads to performance improvements between 10-25% across a wide range of tasks, including those where previously considered prompting techniques, such as Chain-of-Thoughts, which rely on human-annotated explanations, fall short. This highlights the potential of utilizing post hoc explanation methods as a valuable tool for enhancing the efficiency and effectiveness of large language models in various tasks. Furthermore, we conduct an extensive empirical analysis to examine the impact and improvements attributed to each step of AMPLIFY offering critical insights for refining in-context learning while addressing the limitations posed by methods dependent on human-annotated rationales.
×