Debiasing Embeddings for Fairer Text Classification
Abstract
                (Bolukbasi et al., 2016) demonstrated that pre-trained  word embeddings  can  inherit  gender bias from the data they were trained on.  We investigate  how  this  bias  affects  downstream classification  tasks,  using  the  case  study  of occupation  classification  (De-Arteaga  et  al.,2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by providing a less noisy  channel  for  communicating  gender information.   With  a  relatively  minor  adjustment, however, we show how these same techniques can be used to simultaneously reduce bias and obtain high classification accuracy.