Skip to main content

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Models/Products

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Shaping the future together

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Research

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Resources

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Models/Products

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Conferences & events

Careers

Shaping the future together

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Blog

About

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Google Research

Learn about all our AI

Google DeepMind

Explore the frontier of AI

Try our AI experiments

Conferences & events

Blog

John Hernandez, PhD, MPP

Home
People

John Hernandez, PhD, MPP

Director, Head of Clinical Research Center of Excellence and Health Impact team

Research Areas

Health & Bioscience

Authored Publications

results

Filter by:

Publications

Google 17
Other 0

Years

2025 4
2024 10
2023 1
2021 1
2020 1

Research Areas

Algorithms and Theory 1
Economics and Electronic Commerce 1
General Science 1
Health & Bioscience 6
Machine Intelligence 2

Teams

I-DRIM 1

Sort By

Title
Title, descending
Year
Year, descending

chip template

Smartphone use in a large US adult population: Temporal associations between objective measures of usage and mental well-being

John Hernandez

Ari Winbush

Benjamin Nelson

Nicholas Allen

Andrew Barakat

Daniel McDuff

Conor Heneghan

Allen Jiang

PNAS (2025)

Preview abstract Smartphones are a vital tool for most people. They facilitate many everyday tasks and as a result they have become ubiquitous and indispensable. There are concerns about how the use of these devices may impact mental health and wellbeing. Yet, there are few studies that have reported objective data about phone usage from large and diverse cohorts and studies have found low correlations between subjective and objective smartphone use. In order to better elucidate these complex interactions, it is important to understand and characterize what resembles “normative” smartphone use behavior. In this paper, we present normative patterns of objectively measured phone usage from a large prospective observational study. We analyze a quarter of a million days of phone usage data from 10,099 adult subjects that provides objective longitudinal data over a four week period in the US general population. Contrary to popular belief, our model shows little support for the conclusion that smartphone use predicts mood the following week or that mood predicts smartphone use the following week, with some results differing depending on whether the effects are within-person or between-person. Lastly, while some findings are statistically significant, the effect sizes of these results are minimal, suggesting little to no impact in real-world settings and therefore a lack of clinical significance. View details

Capturing Real-World Habitual Sleep Patterns with a Novel User-centric Algorithm to Pre-Process Fitbit Data in the All of Us Research Program: Retrospective observational longitudinal study

Hiral Master

Jeffrey Annis

Jack Ching

Karla Gleichauf

Lide Han

Peyton Coleman

Kelsie Full

Neil Zheng

Doug Ruderfer

John Hernandez

Logan Schneider

Evan Brittain

Journal of Medical Internet Research (2025)

Preview abstract Background: Commercial wearables such as Fitbit quantify sleep metrics using fixed calendar times as default measurement periods, which may not adequately account for individual variations in sleep patterns. To address this limitation, experts in sleep medicine and wearable technology developed a user-centric algorithm designed to more accurately reflect actual sleep behaviors and improve the validity of wearable-derived sleep metrics. Objective: This study aims to describe the development of a new user-centric algorithm, compare its performance with the default calendar-relative algorithm, and provide a practical guide for analyzing All of Us Fitbit sleep data on a cloud-based platform. Methods: The default and user-centric algorithms were implemented to preprocess and compute sleep metrics related to schedule, duration, and disturbances using high-resolution Fitbit sleep data from 8563 participants (median age 58.1 years, 6002/8341, 71.96%, female) in the All of Us Research Program (version 7 Controlled Tier). Variations in typical sleep patterns were calculated by examining the differences in the mean number of primary sleep logs classified by each algorithm. Linear mixed-effects models were used to compare differences in sleep metrics across quartiles of variation in typical sleep patterns. Results: Out of 8,452,630 total sleep logs collected over a median of 4.2 years of Fitbit monitoring, 401,777 (4.75%) nonprimary sleep logs identified by the default algorithm were reclassified as primary sleep by the user-centric algorithm. Variation in typical sleep patterns ranged from –0.08 to 1. Among participants with the greatest variation in typical sleep patterns, the user-centric algorithm identified significantly more total sleep time (by 17.6 minutes; P<.001), more wake after sleep onset (by 13.9 minutes; P<.001), and lower sleep efficiency (by 2.0%; P<.001), on average. Differences in sleep stage metrics between the 2 algorithms were modest. Conclusions: The user-centric algorithm captures the natural variability in sleep schedules, providing an alternative approach to preprocess and evaluate sleep metrics related to schedule, duration, and disturbances. A publicly available R package facilitates the implementation of this algorithm for clinical and translational research. View details

The Anatomy of a Personal Health Agent

Hamid Palangi

John Hernandez

Ali Heydari

Ahmed Metwally

Ken Gu

Jiening Zhan

Kumar Ayush

Hong Yu

Akshay Paruchuri

Amy Lee

Qian He

Yun Liu

Zhihan Zhang

Isaac Galatzer-Levy

Xavi Prieto

Andrew Barakat

Ben Graef

Yuzhe Yang

Daniel McDuff

Brent Winslow

Shwetak Patel

Girish Narayanswamy

Conor Heneghan

Max Xu

Jacqueline Shreibati

Jake Garrison

Mark Malhotra

Xin Liu

Orson Xu

Tim Althoff

Tony Faranesh

Nova Hammerquist

Vidya Srinivas

arXiv (2025)

Preview abstract Health is a fundamental pillar of human wellness, and the rapid advancements in large language models (LLMs) have driven the development of a new generation of health agents. However, the solution to fulfill diverse needs from individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health assistant that is able to reason about multimodal data from everyday consumer devices and personal health records. To understand end users’ needs when interacting with such an assistant, we conducted an in-depth analysis of query data from users, alongside qualitative insights from users and experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist subagent: (1) a data science agent that analyzes both personal and population-level time-series wearable and health record data to provide numerical health insights, (2) a health domain expert agent that integrates users’ health and contextual data to generate accurate, personalized insights based on medical and contextual user knowledge, and (3) a health coach agent that synthesizes data insights, drives multi-turn user interactions and interactive goal setting, guiding users using a specified psychological strategy and tracking users’ progress. Furthermore, we propose and develop a multi-agent framework, Personal Health Insight Agent Team (PHIAT), that enables dynamic, personalized interactions to address individual health needs. To evaluate these individual agents and the multi-agent system, we develop a set of N benchmark tasks and conduct both automated and human evaluations, involving 100’s of hours of evaluation from health experts, and 100’s of hours of evaluation from end-users. Our work establishes a strong foundation towards the vision of a personal health assistant accessible to everyone in the future and represents the most comprehensive evaluation of a consumer AI health agent to date. View details

A personal health large language model for sleep and fitness coaching

Justin Khasentino

Anastasiya Belyaeva

Xin Liu

Zhun Yang

Nick Furlotte

Chace Lee

Erik Schenck

Yojan Patel

Jian Cui

Logan Schneider

Robby Bryant

Ryan Gomes

Allen Jiang

Roy Lee

Yun Liu

Javier Perez

Jamie Rogers

Cathy Speed

Shyam Tailor

Megan Walker

Jeffrey Yu

Tim Althoff

Conor Heneghan

John Hernandez

Mark Malhotra

Leor Stern

Yossi Matias

Greg Corrado

Shwetak Patel

Shravya Shetty

Jiening Zhan

Shruthi Prabhakara

Daniel McDuff

Cory McLean

Nature Medicine (2025)

Preview abstract Although large language models (LLMs) show promise for clinical healthcare applications, their utility for personalized health monitoring using wearable device data remains underexplored. Here we introduce the Personal Health Large Language Model (PH-LLM), designed for applications in sleep and fitness. PH-LLM is a version of the Gemini LLM that was finetuned for text understanding and reasoning when applied to aggregated daily-resolution numerical sensor data. We created three benchmark datasets to assess multiple complementary aspects of sleep and fitness: expert domain knowledge, generation of personalized insights and recommendations and prediction of self-reported sleep quality from longitudinal data. PH-LLM achieved scores that exceeded a sample of human experts on multiple-choice examinations in sleep medicine (79% versus 76%) and fitness (88% versus 71%). In a comprehensive evaluation involving 857 real-world case studies, PH-LLM performed similarly to human experts for fitness-related tasks and improved over the base Gemini model in providing personalized sleep insights. Finally, PH-LLM effectively predicted self-reported sleep quality using a multimodal encoding of wearable sensor data, further demonstrating its ability to effectively contextualize wearable modalities. This work highlights the potential of LLMs to revolutionize personal health monitoring via tailored insights and predictions from wearable data and provides datasets, rubrics and benchmark performance to further accelerate personal health-related LLM research. View details

Towards a Personal Health Large Language Model

Justin Cosentino

Anastasiya Belyaeva

Xin Liu

Nick Furlotte

Zhun Yang

Chace Lee

Erik Schenck

Yojan Patel

Jian Cui

Logan Schneider

Robby Bryant

Ryan Gomes

Allen Jiang

Roy Lee

Yun Liu

Javier Perez

Jamie Rogers

Cathy Speed

Shyam Tailor

Megan Walker

Jeffrey Yu

Tim Althoff

Conor Heneghan

John Hernandez

Mark Malhotra

Leor Stern

Yossi Matias

Greg Corrado

Shwetak Patel

Shravya Shetty

Jiening Zhan

Yeswanth Subramanian

Shruthi Prabhakara

Daniel McDuff

Cory McLean

arXiv (2024)

Preview abstract Large language models (LLMs) can retrieve, reason over, and make inferences about a wide range of information. In health, most LLM efforts to date have focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into clinical tasks, provide a rich, continuous, and longitudinal source of data relevant for personal health monitoring. Here we present a new model, Personal Health Large Language Model (PH-LLM), a version of Gemini fine-tuned for text understanding and reasoning over numerical time-series personal health data for applications in sleep and fitness. To systematically evaluate PH-LLM, we created and curated three novel benchmark datasets that test 1) production of personalized insights and recommendations from measured sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep quality outcomes. For the insights and recommendations tasks we created 857 case studies in sleep and fitness. These case studies, designed in collaboration with domain experts, represent real-world scenarios and highlight the model’s capabilities in understanding and coaching. Through comprehensive human and automatic evaluation of domain-specific rubrics, we observed that both Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. To further assess expert domain knowledge, we evaluated PH-LLM performance on multiple choice question examinations in sleep medicine and fitness. PH-LLM achieved 79% on sleep (N=629 questions) and 88% on fitness (N=99 questions), both of which exceed average scores from a sample of human experts as well as benchmarks for receiving continuing credit in those domains. To enable PH-LLM to predict self-reported assessments of sleep quality, we trained the model to predict self-reported sleep disruption and sleep impairment outcomes from textual and multimodal encoding representations of wearable sensor data. We demonstrate that multimodal encoding is both necessary and sufficient to match performance of a suite of discriminative models to predict these outcomes. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge base and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM. View details

Digital devices can help clinicians prescribe physical activity

Laurie Whitsell

John Hernandez

Candice Taguibao

STAT (2024)

Preview abstract This Op-ed is by leaders from the American Heart Association, Digital Medicine Society and Google involved in a Digital Medicine Society-sponsored project on digital measures for physical activity. The Op-ed summarizes evidence that the technology exists today to digitally measure physical activity in the broad population – and, by measuring it the right way, we can embrace it as the ‘6th vital sign’ and enter a new era of healthcare centered on proactive patient care. View details

Predicting subjective sleep impairment and disturbance from wearable sleep data

Conor Heneghan

Ben Yetton

Daniel McDuff

Nicholas Allen

Andrew Barakat

John Hernandez

Allen Jiang

Logan Schneider

Benjamin Nelson

Ari Winbush

2024

Preview abstract Introduction: Wearables offer a scalable, passive and objective measure of sleep health. However, prior reported correlations (spearman) between subjective and wearable derived sleep measures have been modest (rS=0.3-0.46). We set out to determine if wearables adequately capture subjective feelings of sleep disturbance and impairment in a large, diverse ecologically valid sleep study. Methods: Subject data (n=2922, mean age= 45.4 (12.6), 74% female) came from the Digital Wellbeing Study: a joint study between the University of Oregon and Google to investigate how smartphone usage impacts well-being. Wearable (Fitbit) derived sleep metrics were summarized across the week prior to the administration of the PROMIS Sleep Disturbance (SD) and Sleep Related Impairment (SR) Short Form surveys. A series of stepwise OLS regressions were used to test the predictive power of each sleep metric over a baseline model of age and sex. Results: Sleep variables of total sleep time, resting heart rate, and the variability in total sleep time and restlessness (accelerometer based metric) improved both SI and SD above a baseline model (SIBaseline adjR2=0.087, SDBaseline adjR2=0.024). Deep (e.g. N3) minutes uniquely improved SI model fit, while longest wake length and total wake minutes improved SD fit. REM percent and normalized nightly heart rate did not improve model fit. The final model explained 12.9% of the variance of SI, and 8.4% of the variance of SD. The most predictive single sleep metric was the variability in total sleep time (adjR2=0.104) for SI, and total sleep time for SD (age & sex included). Fitbit’s composite “Sleep Score” was the single best predictor of SD when included in analysis (age and sex excluded). Conclusion: As demonstrated in previous studies, wearable derived sleep metrics are modest predictors of perceived sleep disturbance or sleep related impairment. Composite metrics that include measures of sleep variability are recommended. Support: This research was funded by Google Inc. View details

Economic evaluation of a wearable-based intervention to increase physical activity among insufficiently active middle-aged adults

Jack Ching

Steve Duff

John Hernandez

medRxiv (2024)

Preview abstract Background: Physical activity levels worldwide have declined over recent decades, with the average number of daily steps decreasing steadily since 1995. Given that physical inactivity is a major modifiable risk factor for chronic disease and mortality, increasing the level of physical activity is a clear opportunity to improve population health on a broad scale. The current study aims to assess the cost-effectiveness and budget impact of a Fitbit-based intervention among healthy, but insufficiently active, adults to quantify the potential clinical and economic value for a commercially insured population in the U.S. Methods: An economic model was developed to compare physical activity, health outcomes, costs, and quality-adjusted life-years (QALYs) associated with usual care and a Fitbit-based intervention that consists of a consumer wearable device alongside goal setting and feedback features provided in a companion software application. Improvement in physical activity was measured in terms of mean daily step count. The effects of increased daily step count were characterized as reduced short-term healthcare costs and decreased incidence of chronic diseases with corresponding improvement in health utility and reduced disease costs. Published literature, standardized costing resources, and data from a National Institutes of Health-funded research program were utilized. Cost-effectiveness and budget impact analyses were performed for a hypothetical cohort of middle-aged adults. Results: The base case cost-effectiveness results found the Fitbit intervention to be dominant (less costly and more effective) compared to usual care. Discounted 15-year incremental costs and QALYs were -$1,257 and 0.011, respectively. In probabilistic analyses, the Fitbit intervention was dominant in 93% of simulations and either dominant or cost-effective (defined as less than $150,000/QALY gained) in 99.4% of simulations. For budget impact analyses conducted from the perspective of a U.S. Commercial payer, the Fitbit intervention was estimated to save approximately $6.5-million dollars over 2 years and $8.5-million dollars over 5 years for a cohort of 8,000 participants. Although the economic analysis results were very robust, the short-term healthcare cost savings were the most uncertain in this population and warrant further research. Conclusions: There is abundant evidence documenting the benefits of wearable activity trackers when used to increase physical activity as measured by daily step counts. Our research provides additional health economic evidence supporting implementation of wearable-based interventions to improve population health and offers compelling support for payers to consider including wearable-based physical activity interventions as part of a comprehensive portfolio of preventive health offerings for their insured populations. View details

Research Protocol for the Google Health Digital Wellbeing Study

John Hernandez

Ari Winbush

Nicholas Allen

Andrew Barakat

Felicia Cordeiro

Daniel McDuff

Allen Jiang

Ryann Crowley

JMIR Research Protocols (2024)

Preview abstract The impact of digital device use on health and wellbeing is a pressing question to which individuals, families, schools, policy makers, legislators, and digital designers are all demanding answers. However, the scientific literature on this topic to date is marred by small and/or unrepresentative samples, poor measurement of core constructs (e.g., device use, smartphone addiction), and a limited ability to address the psychological and behavioral mechanisms that may underlie the relationships between device use and wellbeing. A number of recent authoritative reviews have made urgent calls for future research projects to address these limitations. The critical role of research is to identify which patterns of use are associated with benefits versus risks, and who is more vulnerable to harmful versus beneficial outcomes, so that we can pursue evidence-based product design, education, and regulation aimed at maximizing benefits and minimizing risks of smartphones and other digital devices. We describe a protocol for a Digital Wellbeing Study (DWB) to help answer these questions. View details

Analysis of objective and subjective sleep metrics and smartphone usage patterns

Conor Heneghan

Daniel McDuff

Ari Winbush

Nicholas Allen

John Hernandez

Allen Jiang

Andrew Barakat

Logan Schneider

Benjamin Nelson

Ben Yetton

2024

Preview abstract Analysis of objective and subjective sleep metrics and smartphone usage patterns Conor Heneghan, , Daniel McDuff, Ari Winbush, Nicholas Allen, John Hernandez, Allen Jiang,, Andrew Barakat, Logan Schneider, Benjamin Nelson, Ben Yetton Consumer Health Research Team, Google Inc. Department of Psychology, University of Oregon Verily Life Sciences Department of Psychiatry, Harvard Medical School and Beth Israel Deaconess Medical Center Introduction: The Digital Wellbeing Study is an IRB approved joint study between the University of Oregon and Google to investigate how smartphone usage interacts with objective and subjective parameters of well-being such as sleep, exercise and stress. The study recruited a demographically diverse population who each wore a smartwatch and installed a smartphone app linked to the study. Participants completed demographic and health questionnaires including the PROMIS Sleep Disturbance (SD) Short Form. Aims of the study included (a) whether objective sleep duration was correlated with smartphone use, and (b) whether smartphone usage could predict the subjective self reported sleep instrument. Methods: There was sufficient data from 7,499 users to conduct a population modeling analysis. An Ordinary Least Squares linear model was used as a predictor of each subject’s average total sleep time (TST) and their SD t-score. The inputs to the model included demographics, and population z-scored activity measures (steps, sedentary time, time driving, time at work, home and other locations, phone screen time, frequency of phone unlocks) over seven days prior to the survey. Results: The activity measures and baseline demographics could only explain a small amount of the overall variance in TST and SD (R^2=0.04 for TST and R^2=0.05 for SD). Phone screen time was a statistically significant predictor of both TST (-8.19 mins, p< 0.001) and self-reported sleep disruption (0.611 t-score units, p< 0.001). The number of phone unlocks was a predictor of variability in TST (-3.33 mins, p< 0.001) suggesting that longer session times are correlated with greater TST variability. The effects are minimal (e.g., a subject who has one standard deviation greater phone screen time than average would be predicted to only see a 2% reduction in TST, and a 0.6% increase in perceived sleep disturbance). Time driving and step count were also minor predictors of SD and TST. Conclusion: At a population level, average activity measures from wearables and smartphones such as steps, smartphone usage time, sedentary activity etc. are limited predictors of objective sleep metrics such as Total Sleep Time, and subjective sleep metrics such as the PROMIS Sleep Disturbance t-score. Support (if any): This research was funded by Google Inc. View details

1
2

of 2

of 2 pages

Search on Google Scholar

Join us

We're always looking for more talented, passionate people.

See opportunities

Follow us

Explore our other initiatives

Google Ai

Discover how Google AI is committed to enriching knowledge, solving complex challenges

Products
Build
Research
Responsibility
Social Impact
About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview
Solutions
Products
Pricing
Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models
Research
Science
About

Google Labs

Explore the future of AI responsibly with Google Labs

About
Experiments
Sessions
Community

Google Products

×