Jennifer Petoff
Jennifer Petoff is Director of Google Cloud Platform (GCP) & Technical Infrastructure (TI) Education and is based in Lisbon, Portugal. She leads training programs for Google's GCP and TI Engineering Teams. Jennifer is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems and is a regular speaker at DevOps and SRE conferences around the world. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester in the United States.
Authored Publications
Sort By
Site Reliability Engineering for High Performing Software and Teams [Platform Engineering Edition]
SRE SkilUp Day [People Cert | DevOps Institute] (2024)
Preview abstract
Site Reliability Engineering (SRE) is a discipline founded at Google that is now widely practiced across the Tech industry. SRE represents a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. In this talk, we will discuss the key principles and practices of SRE, and how they can be used to build high performance software and teams. We’ll explore insights from the State of DevOps Report and how SRE and platform engineering can drive organizational performance.
View details
Site Reliability Engineering to build high performance software and teams
Google Cloud Innovators Hive - Nordics (2023)
Preview abstract
Site Reliability Engineering (SRE) is a discipline founded at Google that is now widely practiced across the Tech industry. SRE represents a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations. In this talk, we will discuss the key principles and practices of SRE, and how they can be used to build high performance software and teams. We’ll explore insights from the State of DevOps Report and how SRE can help foster the type of generative organizational culture that is a hallmark of high performing organizations.
View details
Preview abstract
Site Reliability Engineering principles, best practices, and culture do not feature systematically in the undergraduate curriculum around the world. Nor do principles of non-abstract large system design. Despite this, students can be taught (and learn through experience) to be great SREs upon graduation.
This talk will equip SRE hiring managers with creative ways to build a pipeline of talent. We’ll share techniques that we’ve found to be effective in super-charging our SRE hiring pipeline from universities in Ireland.
View details
The Origins of SRE and Why It’s Important
Preview
DevOps Institute (2021)
Preview abstract
Site Reliability Engineers (SRE) are Google's specialists for designing, building, and running complex services that are reliable, scalable, efficient, and maintainable. The SRE Engagement Model describes how the collaboration between developers and SREs works, how SRE is funded, what kind of work SRE is best suited for, and how reliability engineering can be applied early in the service lifecycle.
View details
SREcon21 Panel Discussion: Engineering Onboarding
USENIX SREcon EMEA 2021 (2021)
Preview abstract
In this panel on Engineering Onboarding we will discuss with a few industry experts their thoughts on what are the big questions and challenges in this field? What have been the significant changes in the past few years? And, finally, what next?
View details
Why Training Matters to an SRE Practice and Why SRE Matters To Your Training Program
97 Things Every SRE Should Know, O'Reilly (2021), pp. 162-163
Preview abstract
This contribution explores why training matters to a successful and inclusive SRE practice. On the flip side, I’ll share what learning and development practitioners can learn from SRE principles, practices, and culture to deliver a consistent and reliable program.
View details
Preview abstract
Real world experience and things that go wrong are two of life’s best teachers. This talk will explore key elements of scalable large-system design and Site Reliability Engineering (SRE) principles* through anti-patterns encountered in real life. Find out what lessons can be gleaned from watching the dynamics in a crowded cafe or dealing with a security issue during a hotel stay. Learn about fundamental site reliability engineering principles and practices including:
-Avoiding cascading failures
-Not feeding the machines with human toil
-Writing blameless postmortems
-Engineering solutions to eliminate classes of errors rather than implementing point fixes
These principles will be framed through a lens of the suboptimal while demonstrating the impact of SRE anti-patterns on user trust.
* SRE is often thought of as a specific implementation of the DevOps interface.
View details