Samrat Phatale

Samrat Phatale

Aspiring Researcher. I am in interested in Language Models and Reinforcement Learning. My mission is to study in-depth the nature of "intelligence", both human and artificial.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Parameter Efficient Reinforcement Learning from Human Feedback
    Hakim Sidahmed
    Alex Hutcheson
    Zhuonan Lin
    Zhang Chen
    Zac Yu
    Jarvis Jin
    Simral Chaudhary
    Roman Komarytsia
    Christiane Ahlheim
    Yonghao Zhu
    Bowen Li
    Jessica Hoffmann
    Hassan Mansoor
    Wei Li
    Abhinav Rastogi
    2024
    Preview abstract While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language Models (LLMs) with human preferences, its computational cost and complexity hinder wider adoption. This work introduces Parameter-Efficient Reinforcement Learning (PERL): by leveraging Low-Rank Adaptation (LoRA) \citep{hu2021lora} for reward model training and reinforcement learning, we are able to perform RL loops while updating only a fraction of the parameters required by traditional RLHF. We demonstrate that the effectiveness of this method is not confined to a specific task. We compare PERL to conventional fine-tuning (full-tuning) across X highly diverse tasks, spanning from summarization to X and X, for a total of X different benchmarks - including two novel preference datasets released with this paper. Our findings show that PERL achieves comparable performance to RLHF while significantly reducing training time (up to 2x faster for reward models and 15\% faster for RL loops), and memory footprint (up to 50\% reduction for reward models and 25\% for RL loops). Finally, we provide a single set of parameters that achieves results on par with RLHF on every task, which shows the accessibility of the method. By mitigating the computational cost and the burden of hyperparameter search, PERL facilitates broader adoption of RLHF as an LLM alignment technique. View details
    Conversational Recommendation as Retrieval: A Simple, Strong Baseline
    Raghav Gupta
    Renat Aksitov
    Simral Chaudhary
    Abhinav Rastogi
    5th Workshop on NLP for Conversational AI (2023)
    Preview abstract Conversational recommendation systems (CRS) aim to recommend suitable items to users through natural language conversation. However, most CRS approaches do not effectively utilize the signal provided by these conversations. They rely heavily on explicit external knowledge e.g., knowledge graphs to augment the models' understanding of the items and attributes, which is quite hard to scale. To alleviate this, we propose an alternative information retrieval (IR)-styled approach to the CRS item recommendation task, where we represent conversations as queries and items as documents to be retrieved. We expand the document representation used for retrieval with conversations from the training set. With a simple BM25-based retriever, we show that our task formulation compares favorably with much more complex baselines using complex external knowledge on a popular CRS benchmark. We demonstrate further improvements using user-centric modeling and data augmentation to counter the cold start problem for CRSs. View details
    ×