Jennifer Petoff

Jennifer Petoff

Jennifer Petoff is Director of Google Cloud Platform (GCP) & Technical Infrastructure (TI) Education and is based in Lisbon, Portugal. She leads training programs for Google's GCP and TI Engineering Teams. Jennifer is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems and is a regular speaker at DevOps and SRE conferences around the world. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester in the United States.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Site Reliability Engineering (SRE) is a discipline founded at Google that is now widely practiced across the Tech industry. SRE represents a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. In this talk, we will discuss the key principles and practices of SRE, and how they can be used to build high performance software and teams. We’ll explore insights from the State of DevOps Report and how SRE and platform engineering can drive organizational performance. View details
    Preview abstract Site Reliability Engineering (SRE) is a discipline founded at Google that is now widely practiced across the Tech industry. SRE represents a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. In this talk, we will discuss the key principles and practices of SRE, and how they can be used to build high performance software and teams. We’ll explore insights from the State of DevOps Report and how SRE can help foster the type of generative organizational culture that is a hallmark of high performing organizations. View details
    New Grads Becoming New SREs: Catalyzing a Circle of Life in Ireland
    Daniel Crawford
    Catalina Rete
    USENIX SREcon EMEA 2023 (2023)
    Preview abstract Site Reliability Engineering principles, best practices, and culture do not feature systematically in the undergraduate curriculum around the world. Nor do principles of non-abstract large system design. Despite this, students can be taught (and learn through experience) to be great SREs upon graduation. This talk will equip SRE hiring managers with creative ways to build a pipeline of talent. We’ll share techniques that we’ve found to be effective in super-charging our SRE hiring pipeline from universities in Ireland. View details
    Preview abstract Site Reliability Engineers (SRE) are Google's specialists for designing, building, and running complex services that are reliable, scalable, efficient, and maintainable. The SRE Engagement Model describes how the collaboration between developers and SREs works, how SRE is funded, what kind of work SRE is best suited for, and how reliability engineering can be applied early in the service lifecycle. View details
    Preview abstract In this panel on Engineering Onboarding we will discuss with a few industry experts their thoughts on what are the big questions and challenges in this field? What have been the significant changes in the past few years? And, finally, what next? View details
    Preview abstract This contribution explores why training matters to a successful and inclusive SRE practice. On the flip side, I’ll share what learning and development practitioners can learn from SRE principles, practices, and culture to deliver a consistent and reliable program. View details
    Preview abstract Real world experience and things that go wrong are two of life’s best teachers. This talk will explore key elements of scalable large-system design and Site Reliability Engineering (SRE) principles* through anti-patterns encountered in real life. Find out what lessons can be gleaned from watching the dynamics in a crowded cafe or dealing with a security issue during a hotel stay. Learn about fundamental site reliability engineering principles and practices including: -Avoiding cascading failures -Not feeding the machines with human toil -Writing blameless postmortems -Engineering solutions to eliminate classes of errors rather than implementing point fixes These principles will be framed through a lens of the suboptimal while demonstrating the impact of SRE anti-patterns on user trust. * SRE is often thought of as a specific implementation of the DevOps interface. View details