A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Vlad Feinberg

Sanjiv Kumar

Ankit Singh Rawat

Wittawat Jitkrittum

Ayan Chakrabarti

Nikunj Saunshi

Veeru Sadhanala

Rakesh Shivanna

Sashank Reddi

Rohan Anil

Seungyeon Kim

Zack Nado

Aditya Menon

Hrayr Harutyunyan

Afshin Rostamizadeh

arXiv (2024)

Google Scholar

Abstract

A primary challenge in large language model (LLM) development is their onerous pre-training cost. Typically, such pre-training involves optimizing a self-supervised objective (such as next-token prediction) over a large corpus. This paper explores a promising paradigm
to improve LLM pre-training efficiency and quality by suitably leveraging a small language model (LM). In particular, this paradigm relies on a small LM to both (1) provide soft labels as additional training supervision, and (2) select a small subset of valuable (``informative'' and ``hard'') training examples. Put together, this enables an effective transfer of the small LM's predictive distribution to the LLM, while prioritizing specific regions of the training data distribution. Empirically, this leads to reduced LLM training time as compared to standard training, while improving the overall quality. Theoretically, we develop a statistical framework to systematically study the utility of small LMs in enabling efficient training of high-quality LLMs.
In particular, our framework characterizes how the small LM's seemingly low-quality supervision
can enhance the training of a much more capable LLM. Furthermore, it also highlights the need for an adaptive utilization of such supervision, by striking a balance between the bias and variance introduced by the small LM-provided soft labels. We corroborate our theoretical framework by improving the pre-training of an LLM with 2.8B parameters by utilizing a smaller LM with 1.5B parameters on the Pile dataset.

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

Abstract

Meet the teams driving innovation