Spectral distortion model for training phase-sensitive deep-neural networks for far-field speech recognition

Chanwoo Kim
Rajeev Nongpiur
ICASSP 2018 (2018)

Abstract

In this paper, we present an algorithm which introduces phaseperturbation
to the training database when training phase-sensitive
deep neural-network models. Traditional features such as log-mel or
cepstral features do not have have any phase-relevant information.
However more recent features such as raw-waveform or complex
spectra features contain phase-relevant information. Phase-sensitive
features have the advantage of being able to detect differences in
time of arrival across different microphone channels or frequency
bands. However, compared to magnitude-based features, phase
information is more sensitive to various kinds of distortions such
as variations in microphone characteristics, reverberation, and so
on. For traditional magnitude-based features, it is widely known
that adding noise or reverberation, often called Multistyle-TRaining
(MTR) , improves robustness. In a similar spirit, we propose an algorithm
which introduces spectral distortion to make the deep-learning
model more robust against phase-distortion. We call this approach
Spectral-Distortion TRaining (SDTR) and Phase-Distortion TRaining
(PDTR). In our experiments using a training set consisting of
22-million utterances, this approach has proved to be quite successful
in reducing Word Error Rates in test sets obtained with real
microphones on Google Home