Conversational Music Retrieval with Synthetic Data

Megan Eileen Leszczynski
Ravi Ganti
Shu Zhang
Arun Tejasvi Chaganty
Second Workshop on Interactive Learning for Natural Language Processing at NeurIPS 2022

Abstract

Users looking for recommendations often wish to improve suggestions through
broad natural language feedback (e.g., “How about something more upbeat?”).
However, building such conversational retrieval systems requires conversational
data with rich user utterances paired with slates of items that cover a diverse
range of preferences. This is challenging to collect scalably using conventional
methods like crowd-sourcing. We address this problem with a new technique to
synthesize high-quality dialog data by transforming the domain expertise encoded
in curated item collections into corresponding item-seeking conversations. The
method first generates a sequence of hypothetical slates returned by a system,
and then uses a language model to introduce corresponding user utterances. We
apply the approach on a dataset of curated music playlists to generate 10k diverse
music-seeking conversations. A qualitative human evaluation shows that a majority
of these conversations express believable sequences of slates and include user
utterances that faithfully express preferences for them. When used to train a
conversational retrieval model, the synthetic data yields up to a 23% relative gain
on standard retrieval metrics compared to baselines trained on non-conversational
and conversational datasets.