Abstract
Most NLP tasks such as summarization, semantic parsing can now be fulfilled with LLMs without any external pipelines. Retrieval remains a task that requires a separate retriever model, making it pipelined and cumbersome.
In this work, we explore posing Retrieval as a generation task than can be completely folded into LLMs.
We draw motivation from two attributes of LLMs:
a) LLMs are knowledge warehouses. They memorize tons of corpora during pre-training giving them access to vast amounts of information in their parameters.
b) LLM Decoding is inherently a search mechanism, searching a meaningful sequences through the universe of all output paths/sequences.
Where LLMs lack is their failure to attribute the generation to trusted knowledge corpus.
In this work, we force the LLM to generate only verbatim sequences from the corpus by constraining decoding. Moreover, we can stitch together constrained and natural unconstrained generation, allowing us to blend reasoning with retrieval. This is achieved within a single decoding pass of LLM, no pipelined systems needed.