Saravanan Ganesh

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Parameter Efficient Reinforcement Learning from Human Feedback
    Hakim Sidahmed
    Alex Hutcheson
    Zhuonan Lin
    Zhang Chen
    Zac Yu
    Jarvis Jin
    Simral Chaudhary
    Roman Komarytsia
    Christiane Ahlheim
    Yonghao Zhu
    Bowen Li
    Jessica Hoffmann
    Hassan Mansoor
    Wei Li
    Abhinav Rastogi
    2024
    Preview abstract While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language Models (LLMs) with human preferences, its computational cost and complexity hinder wider adoption. This work introduces Parameter-Efficient Reinforcement Learning (PERL): by leveraging Low-Rank Adaptation (LoRA) \citep{hu2021lora} for reward model training and reinforcement learning, we are able to perform RL loops while updating only a fraction of the parameters required by traditional RLHF. We demonstrate that the effectiveness of this method is not confined to a specific task. We compare PERL to conventional fine-tuning (full-tuning) across X highly diverse tasks, spanning from summarization to X and X, for a total of X different benchmarks - including two novel preference datasets released with this paper. Our findings show that PERL achieves comparable performance to RLHF while significantly reducing training time (up to 2x faster for reward models and 15\% faster for RL loops), and memory footprint (up to 50\% reduction for reward models and 25\% for RL loops). Finally, we provide a single set of parameters that achieves results on par with RLHF on every task, which shows the accessibility of the method. By mitigating the computational cost and the burden of hyperparameter search, PERL facilitates broader adoption of RLHF as an LLM alignment technique. View details
    Preview abstract We present a data-driven, end-to-end approach to transaction-based dialog systems that performs at near-human levels in terms of verbal response quality and factual grounding accuracy. We show that two essential components of the system produce these results: a sufficiently large and diverse, in-domain labeled dataset, and a neural network-based, pre-trained model that generates both verbal responses and API call predictions. In terms of data, we introduce TicketTalk, a movie ticketing dialog dataset with 23,789 annotated conversations. The movie ticketing conversations range from completely open-ended and unrestricted to more structured, both in terms of their knowledge base, discourse features, and number of turns. In qualitative human evaluations, model-generated responses trained on just 10,000 TicketTalk dialogs were rated to “make sense” 86.5\% of the time, almost the same as human responses in the same contexts. Our simple, API-focused annotation schema results in a much easier labeling task making it faster and more cost effective. It is also the key component for being able to predict API calls accurately. We handle factual grounding by incorporating API calls in the training data, allowing our model to learn which actions to take and when. Trained on the same 10,000-dialog set, the model’s API call predictions were rated to be correct 93.9\% of the time in our evaluations, surpassing the ratings for the corresponding human labels. We show how API prediction and response generation scores improve as the dataset size incrementally increases from 5000 to 21,000 dialogs. Our analysis also clearly illustrates the benefits of pre-training. To facilitate future work on transaction-based dialogs, we have publicly released the TicketTalk dataset at \url{https://git.io/JL8an}. View details
    ×