Prasanna Venkatesh Rengasamy
Hi! Welcome to my page.
My current work focuses on machine learning performance profiling, optimization, and hardware architecture.
Previously, my research centered on optimizing hardware-software architectures. I approached this by analyzing workload behaviors through advanced simulation and tracing, specifically targeting CPU and system caching, execution pipelines, and GPGPU memory optimizations. Prior to my current role, I also contributed to the development of Apple Silicon chips for various Apple products.
Education
- Ph.D. in Computer Science and Engineering — Penn State University
- M.S. in Computer Science and Engineering — Indian Institute of Technology (IIT) Madras, India
- B.Tech. in Computer Science and Engineering — SASTRA University, India
Research Areas
Authored Publications
Sort By
XProf: An Open, Scalable and Extensible Profiling System for the Modern ML Stack
Naveen Kumar
Jose Baiocchi Paredes
Scott Goodson
Kelvin Le
Yin Zhang
Kan Cai
Jiten Thakkar
Sai Ganesh Bandiatmakuri
Yogesh SY
Ani Udipi
Vikas Aggarwal
2026
Preview abstract
Optimizing Large Language Models across thousands of hardware accelerators requires deep system expertise. To address modern machine learning optimization needs, we present XProf, the de-facto machine learning profiler for the OpenXLA ecosystem. XProf delivers actionable optimization suggestions and in-depth performance analysis, empowering machine learning researchers and framework users to improve efficiency without specialized systems knowledge. XProf provides a unified, full-stack view of both host (CPU) and device (accelerator - TPUs/GPUs) performance, leveraging tools like the Roofline Model for comprehensive analysis. Engineered with a distributed architecture, XProf is battle-tested at Google to profile across thousands of chips with minimal overhead (<1%) for the workload. Using the Open Source C API extension to PJRT, this pluggable architecture is already adopted by other third-party accelerator vendors. Originally developed at Google and now open-sourced within the OpenXLA Project, XProf has proven indispensable in production, driving significant efficiency gains and enabling critical results, including winning MLPerf submissions. This paper presents the design and architecture of XProf, showcases its differentiating tools & capabilities, and highlights its impact within Google and across the industry as a state of the art ML profiler. The codebase is at https://github.com/openxla/xprof.
View details