Samrat Phatale
Aspiring Researcher. I am in interested in Language Models and Reinforcement Learning. My mission is to study in-depth the nature of "intelligence", both human and artificial.
Authored Publications
Sort By
Parameter Efficient Reinforcement Learning from Human Feedback
Hakim Sidahmed
Alex Hutcheson
Zhuonan Lin
Zhang Chen
Zac Yu
Jarvis Jin
Simral Chaudhary
Roman Komarytsia
Christiane Ahlheim
Yonghao Zhu
Bowen Li
Jessica Hoffmann
Hassan Mansoor
Wei Li
Abhinav Rastogi
2024
Preview abstract
While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language Models (LLMs) with human preferences, its computational cost and complexity hinder wider adoption.
This work introduces Parameter-Efficient Reinforcement Learning (PERL): by leveraging Low-Rank Adaptation (LoRA) \citep{hu2021lora} for reward model training and reinforcement learning, we are able to perform RL loops while updating only a fraction of the parameters required by traditional RLHF.
We demonstrate that the effectiveness of this method is not confined to a specific task. We compare PERL to conventional fine-tuning (full-tuning) across X highly diverse tasks, spanning from summarization to X and X, for a total of X different benchmarks - including two novel preference datasets released with this paper. Our findings show that PERL achieves comparable performance to RLHF while significantly reducing training time (up to 2x faster for reward models and 15\% faster for RL loops), and memory footprint (up to 50\% reduction for reward models and 25\% for RL loops). Finally, we provide a single set of parameters that achieves results on par with RLHF on every task, which shows the accessibility of the method.
By mitigating the computational cost and the burden of hyperparameter search, PERL facilitates broader adoption of RLHF as an LLM alignment technique.
View details
Conversational Recommendation as Retrieval: A Simple, Strong Baseline
Raghav Gupta
Renat Aksitov
Simral Chaudhary
Abhinav Rastogi
5th Workshop on NLP for Conversational AI (2023)
Preview abstract
Conversational recommendation systems (CRS) aim to recommend suitable items to users through natural language conversation. However, most CRS approaches do not effectively utilize the signal provided by these conversations. They rely heavily on explicit external knowledge e.g., knowledge graphs to augment the models' understanding of the items and attributes, which is quite hard to scale. To alleviate this, we propose an alternative information retrieval (IR)-styled approach to the CRS item recommendation task, where we represent conversations as queries and items as documents to be retrieved. We expand the document representation used for retrieval with conversations from the training set. With a simple BM25-based retriever, we show that our task formulation compares favorably with much more complex baselines using complex external knowledge on a popular CRS benchmark. We demonstrate further improvements using user-centric modeling and data augmentation to counter the cold start problem for CRSs.
View details