Prasanna Venkatesh Rengasamy

Prasanna Venkatesh Rengasamy

Hi! Welcome to my page.

My current work focuses on machine learning performance profiling, optimization, and hardware architecture.

Previously, my research centered on optimizing hardware-software architectures. I approached this by analyzing workload behaviors through advanced simulation and tracing, specifically targeting CPU and system caching, execution pipelines, and GPGPU memory optimizations. Prior to my current role, I also contributed to the development of Apple Silicon chips for various Apple products.


Education
  • Ph.D. in Computer Science and Engineering — Penn State University
  • M.S. in Computer Science and Engineering — Indian Institute of Technology (IIT) Madras, India
  • B.Tech. in Computer Science and Engineering — SASTRA University, India
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    XProf: An Open, Scalable and Extensible Profiling System for the Modern ML Stack
    Naveen Kumar
    Jose Baiocchi Paredes
    Scott Goodson
    Kelvin Le
    Yin Zhang
    Kan Cai
    Jiten Thakkar
    Sai Ganesh Bandiatmakuri
    Yogesh SY
    Ani Udipi
    Vikas Aggarwal
    2026
    Preview abstract Optimizing Large Language Models across thousands of hardware accelerators requires deep system expertise. To address modern machine learning optimization needs, we present XProf, the de-facto machine learning profiler for the OpenXLA ecosystem. XProf delivers actionable optimization suggestions and in-depth performance analysis, empowering machine learning researchers and framework users to improve efficiency without specialized systems knowledge. XProf provides a unified, full-stack view of both host (CPU) and device (accelerator - TPUs/GPUs) performance, leveraging tools like the Roofline Model for comprehensive analysis. Engineered with a distributed architecture, XProf is battle-tested at Google to profile across thousands of chips with minimal overhead (<1%) for the workload. Using the Open Source C API extension to PJRT, this pluggable architecture is already adopted by other third-party accelerator vendors. Originally developed at Google and now open-sourced within the OpenXLA Project, XProf has proven indispensable in production, driving significant efficiency gains and enabling critical results, including winning MLPerf submissions. This paper presents the design and architecture of XProf, showcases its differentiating tools & capabilities, and highlights its impact within Google and across the industry as a state of the art ML profiler. The codebase is at https://github.com/openxla/xprof. View details
    ×