Milind Tambe
Milind Tambe is Principal Scientist and Director of "AI for Social Good" at Google Deepmind; concurrently, he is also Gordon McKay Professor of Computer Science and Director of Center for Research in Computation and Society at Harvard University. He is recipient of the AAAI (Association for Advancement of AI) Award for Artificial Intelligence for the Benefit of Humanity, IJCAI (International Joint Conference on AI) John McCarthy Award, AAAI Feigenbaum prize, ACM/SIGAI Autonomous Agents Research Award from AAMAS (Autonomous Agents and Multiagent Systems Conference), AAAI Robert S Engelmore Memorial Lecture award, INFORMS Wagner prize, the MORS (Military Operations Research Society) Rist Prize. He is a fellow of AAAI and ACM. For his work in AI and public safety, he has received the Columbus Foundation Homeland Security Award, and meritorious Team Commendation from the US Coast Guard and LA Airport Police, and Certificate of Appreciation from US Federal Air Marshals Service for pioneering real-world deployments of security games. Prof. Tambe's papers have received either best paper awards or best paper finalist recognition 30 times at conferences such as AAAI, AAMAS, IJCAI and others. Prof. Tambe and his team have developed pioneering AI systems that deliver real-world impact in public health (e.g., maternal and child health), public safety, and wildlife conservation.
Research Areas
Authored Publications
Sort By
Indexibility is not Enough for Whittle: Improved Near-Optimal Algorithms for Restless Bandits
Abheek Ghosh
Manish Jain
Autonomous Agents and Multi Agent Systems (AAMAS) (2023)
Preview abstract
We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare.
Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We further discuss why the Whittle index policies can provably fail in these settings, despite indexability and how even asymptotic optimality does not translate well to practically relevant planning horizons.
We then propose an alternate planning algorithm based on the mean-field method, which borrows ideas from existing research with some improvements. This algorithm can provably and efficiently obtain near-optimal policies when the number of arms, $N$, is large without the stringent structural assumptions required by Whittle index policies. Our approach is hyper-parameter free, and we provide an improved non-asymptotic analysis which has a) a better dependence on problem dependent parameters b) high probability upper bounds which show that the reward of the policy is reliable c) matching lower bounds for this algorithm, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines.
View details
Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Care
Kai Wang
Shresth Verma
Aditya S. Mate
Sanket Shah
Neha Madhiwalla
Aparna Hegde
AAAI 2023 (to appear)
Preview abstract
This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i)
we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.
View details
Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program
Jackson Killian
Lily Xu
Arpita Biswas
Shresth Verma
Vineet Nair
Aparna Hegde
Neha Madhiwalla
Paula Rodriguez Diaz
Sonja Johnson-Yu
AAAI 2023 (to appear)
Preview abstract
In 2020, maternal mortality in India was estimated to be as high as 130 deaths per 100K live births, nearly twice the UN’s target. To improve health outcomes, the non-profit ARMMAN sends automated voice messages to expecting and new mothers across India. However, 38% of mothers stop listening to these calls, missing critical preventative care information. To improve engagement, ARMMAN employs health workers to intervene by making service calls, but workers can only call a fraction of the 100K enrolled mothers. Partnering with ARMMAN, we model the problem of allocating limited interventions across mothers as a restless multi-armed bandit (RMAB), where the realities of large scale and model uncertainty present key new technical challenges. We address these with GROUPS, a double oracle–based algorithm for robust planning in RMABs with scalable grouped arms. Robustness over grouped arms requires several methodological advances. First, to adversarially select stochastic group dynamics, we develop a new method to optimize Whittle indices over transition probability intervals. Second, to learn group level RMAB policy best responses to these adversarial environments, we introduce a weighted index heuristic. Third, we prove a key theoretical result that planning over grouped arms achieves the same minimax regret–optimal strategy as planning over individual arms, under a technical condition. Finally, using real world data from ARMMAN, we show that GROUPS produces robust policies that reduce minimax regret by up to 50%, halving the number of preventable missed voice messages to connect more mothers with life saving maternal health information.
View details
Preview abstract
Restless multi-armed bandits (RMABs) are an extension of multi-armed bandits (MABs) with state information associated with arms, where the states evolve restlessly with different transition probabilities depending on whether the arms are pulled. The additional state information in RMABs captures broader applications with state dependency, including digital marketing and healthcare recommendation. However, solving RMABs requires information on transition dynamics, which is often not available upfront. This paper considers learning the transition probabilities in an RMAB setting while maintaining small regret. We use the confidence bounds of transition probabilities to define an optimistic Whittle index policy to solve the RMAB problem while maintaining sub-linear regret compared to the benchmark. Our algorithm, UCWhittle, leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than other online learning baselines without structural information. We evaluate UCWhittle on real-world healthcare data to help reduce maternal mortality.
View details
Flexible Budgets in Restless Bandits: A Primal-Dual Algorithm for Efficient Budget Allocation
Jackson Killian
Lily Xu
Paula Rodriguez Diaz
AAAI 2023 (to appear)
Preview abstract
Restless Multi-Armed Bandits (RMABs) are an important model that enable optimizing allocation of limited resources in sequential decision-making settings. Typical RMABs assume the budget --- the number of arms pulled --- per round to be fixed for each step in the planning horizon. However, when planning in real-world settings, resources are not necessarily limited at each planning step; we may be able to distribute surplus resources in one round to an earlier or later round. Often this flexibility in budget is constrained to within a subset of consecutive planning steps. In this paper we define a general class of RMABs with flexible budget, which we term F-RMABs, and provide an algorithm to optimally solve for them. Additionally, we provide heuristics that tradeoff solution quality for efficiency and present experimental comparisons of different F-RMAB solution approaches.
View details
Deployed SAHELI: Field Optimization of Intelligent RMAB for Maternal and Child Care
Shresth Verma
Aditya S. Mate
Paritosh Verma
Sruthi Gorantala
Neha Madhiwalla
Aparna Hegde
Manish Jain
Innovative Applications of Artificial Intelligence (IAAI) (2023) (to appear)
Preview abstract
Underserved communities face critical health challenges due to lack of access to timely and reliable information. Non-governmental organizations are leveraging the widespread use of cellphones to combat these healthcare challenges and spread preventative awareness. The health workers at these organizations reach out individually to beneficiaries; however such programs still suffer from declining engagement. We have deployed SAHELI, a system to efficiently utilize the limited availability of health workers for improving maternal and child health in India. SAHELI uses the Restless Multi-armed Bandit (RMAB) framework to identify beneficiaries for outreach. It is the first deployed application for RMABs
in public health, and is already in continuous use by our partner NGO, ARMMAN. We have already reached ∼ 100K beneficiaries with SAHELI, and are on track to serve 1 million beneficiaries by the end of 2023. This scale and impact has been achieved through multiple innovations in the RMAB model and its development, in preparation of real world data, and in deployment practices; and through careful consideration of responsible AI practices. Specifically, in this paper, we describe our approach to learn from past data to improve the performance of SAHELI’s RMAB model, the real-world challenges faced during deployment and adoption of SAHELI, and the end-to-end pipeline
View details
Adherence Bandits
Jackson A. Killian*
Arshika Lalan*
Aditya Mate*
Manish Jain
The Workshop on Artificial Intelligence for Social Good at AAAI 2023 (2023)
Preview abstract
We define a new subclass of the restless multi-armed bandit framework, that we name Adherence Bandits, designed to capture the dynamics prevalent in many public health intervention problems. We discuss key properties of Adherence Bandits, their real-world motivations, how structures lead to both technical and computational advantages, and natural extensions that have been or can be made to the subclass. We summarise key research works that have contributed to the growing sub-area and finish by highlighting future directions of research
View details
Analyzing and Predicting Low-Listenership Trends in a Large-Scale Mobile Health Program: A Preliminary Investigation
Arshika Lalan
Shresth Verma
Kumar Madhu Sudan
Amrita Mahale
Aparna Hegde
The Workshop in Data Science for Social Good, KDD 2023 (2023)
Preview abstract
Mobile health programs are becoming an increasingly popular medium
for dissemination of health information among beneficiaries in less
privileged communities. Kilkari is one of the world’s largest mobile
health programs which delivers time sensitive audio-messages to
pregnant women and new mothers. We have been collaborating with
ARMMAN, a non-profit in India which operates the Kilkari program,
to identify bottlenecks to improve the efficiency of the program. In
particular, we provide an initial analysis of the trajectories of benefi-
ciaries’ interaction with the mHealth program and examine elements
of the program that can be potentially enhanced to boost its success.
We cluster the cohort into different buckets based on listenership
so as to analyze listenership patterns for each group that could help
boost program success . We also demonstrate preliminary results on
using historical data in a time-series prediction to identify benefi-
ciary dropouts and enable NGOs in devising timely interventions to
strengthen beneficiary retention.
View details
Preview abstract
We consider the task of effect estimation of resource allocation algorithms through clinical trials. Such algorithms are tasked with optimally
utilizing severely limited intervention resources,
with the goal of maximizing their overall benefits
derived. Evaluation of such algorithms through
clinical trials proves difficult, notwithstanding the
scale of the trial, because the agents’ outcomes are
inextricably linked through the budget constraint
controlling the intervention decisions. Towards
building more powerful estimators with improved
statistical significance estimates, we propose a
novel concept involving retrospective reshuffling
of participants across experimental arms at the
end of a clinical trial. We identify conditions
under which such reassignments are permissible
and can be leveraged to construct counterfactual
clinical trials, whose outcomes can be accurately
‘observed’ without uncertainty, for free. We prove
theoretically that such an estimator is more accurate than common estimators based on sample
means — we show that it returns an unbiased estimate and simultaneously reduces variance. We
demonstrate the value of our approach through
empirical experiments on both, real case studies
as well as synthetic and realistic data sets and
show improved estimation accuracy across the
board.
View details
ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria
Vineet Nair
Kritika Prakash
Michael Wilbur
Corinne Namblard
Oyindamola Adeyemo
Abhishek Dubey
Abiodun Adereni
Ayan Mukhopadhyay
IJCAI ' 22 Social Good Track (2022)
Preview abstract
More than 5 million children under five years die
from largely preventable or treatable medical conditions every year, with an overwhelmingly large
proportion of deaths occurring in under-developed
countries with low vaccination uptake. One of
the United Nations’ sustainable development goals
(SDG 3) aims to end preventable deaths of newborns and children under five years of age. We
focus on Nigeria, where the rate of infant mortality is appalling. We collaborate with HelpMum, a
large non-profit organization in Nigeria to design
and optimize the allocation of heterogeneous health
interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our
optimization formulation is intractable in practice.
We present a heuristic approach that enables us to
solve the problem for real-world use-cases. We also
present theoretical bounds for the heuristic method.
Finally, we show that the proposed approach outperforms baseline methods in terms of vaccination
uptake through experimental evaluation. HelpMum
is currently planning a pilot program based on our
approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI driven vaccination uptake program in the country
and hopefully, pave the way for other data-driven
programs to improve health outcomes in Nigeria.
View details