Piotr Mirowski
I am a Staff Research Scientist working at Google DeepMind. As a member Dr. Raia Hadsell‘s and Dr. Shakir Mohamed‘s teams, I have been focusing on navigation-related research, in scaling up autonomous agents to real world environments, in weather and climate forecasting and data compression, and in socio-technical systems studies on computational creativity and participatory AI with artists. Some of my work has been published in Nature, at ICLR and NeurIPS and covered by The Guardian, BBC, Financial Times and many other press outlets.
I studied computer science in France (ENSEEIHT, Toulouse) and obtained my PhD in computer science in 2011 at New York University, with a thesis on "Time Series Modeling with Hidden Variables and Gradient-based Algorithms" supervised by Prof. Yann LeCun (Outstanding Dissertation Award, 2011).
During my theatre and improv performances (with and without robots on the stage), I investigate the use of AI for artistic human and machine-based co-creation.
I studied computer science in France (ENSEEIHT, Toulouse) and obtained my PhD in computer science in 2011 at New York University, with a thesis on "Time Series Modeling with Hidden Variables and Gradient-based Algorithms" supervised by Prof. Yann LeCun (Outstanding Dissertation Award, 2011).
During my theatre and improv performances (with and without robots on the stage), I investigate the use of AI for artistic human and machine-based co-creation.
Authored Publications
Sort By
Preview abstract
What are dimensions of human intent, and how do writing tools shape and augment these expressions? From papyrus to auto-complete, a major turning point was when Alan Turing famously asked, “Can Machines Think?” If so, should we offload aspects of our thinking to machines, and what impact do they have in enabling the intentions we have? This paper adapts the Authorial Leverage framework, from the Intelligent Narrative Technologies literature, for evaluating recent generative model advancements. With increased widespread access to Large Language Models (LLMs), the evolution of our evaluative frameworks follow suit. To do this, we discuss previous expert studies of deep generative models for fiction writers and playwrights, and propose two future directions, (1) author-focused and (2) audience-focused, for furthering our understanding of Authorial Leverage of LLMs, particularly in the domain of comedy writing.
View details
Preview abstract
The Touchdown dataset (Chen et al., 2019) provides instructions by human annotators for navigation through New York City streets and for resolving spatial descriptions at a given location. To enable the wider research community to work effectively with the Touchdown tasks, we are publicly releasing the 29k raw Street View panoramas needed for Touchdown. We follow the process used for the StreetLearn data release (Mirowski et al., 2019) to check panoramas for personally identifiable information and blur them as necessary. These have been added to the StreetLearn dataset and can be obtained via the same process as used previously for StreetLearn. We also provide a reference implementation for both of the Touchdown tasks: vision and language navigation (VLN) and spatial description resolution (SDR). We compare our model results to those given in Chen et al. (2019) and show that the panoramas we have added to StreetLearn fully support both Touchdown tasks and can be used effectively for further research and comparison.
View details
Learning to Navigate in Cities Without a Map
Matthew Grimes
Mateusz Malinowski
Karl Moritz Hermann
Keith Anderson
Denis Teplyashin
Karen Simonyan
Koray Kavukcuoglu
Andrew Zisserman
Raia Hadsell
2018
Vector-based Navigation using Grid-like Representations in Artificial Agents.
Alexander Pritzel
Andrea Banino
Benigno Uria
Brian C Zhang
Caswell Barry
Charles Blundell
Charlie Beattie
Demis Hassabis
Dharshan Kumaran
Greg Wayne
Helen King
Hubert Soyer
Joseph Modayil
Koray Kavukcuoglu
Martin J. Chadwick
Neil Rabinowitz
Raia Hadsell
Razvan Pascanu
Stephen Gaffney
Stig Vilholm Petersen
Thomas Degris
Timothy Lillicrap
Nature (2018)
Preview abstract
Model-free reinforcement learning has recently been shown to be effective at learning
navigation policies from complex image input. However, these algorithms tend to
require large amounts of interaction with the environment, which can be prohibitively costly to obtain
on robots in the real world. We present an approach for efficiently learning goal-directed navigation
policies on a mobile robot, from only a single coverage traversal of recorded data.
The navigation agent learns an effective policy over a diverse action space
in a large heterogeneous environment consisting of more than 2km of travel, through
buildings and outdoor regions that collectively exhibit large variations in visual
appearance, self-similarity, and connectivity. We compare pretrained visual encoders
that enable precomputation of visual embeddings to achieve a throughput of tens of
thousands of transitions per second at training time on a commodity desktop computer,
allowing agents to learn from millions of trajectories of experience in a matter of hours.
We propose multiple forms of computationally efficient stochastic augmentation to
enable the learned policy to generalise beyond these precomputed embeddings,
and demonstrate successful deployment of the learned policy on the real robot without
fine tuning, despite considerable visual differences at test time. The dataset and
code required to reproduce these results and apply the technique to other datasets
and robots is made publicly available at https://github.com/jakebruce/deployable-rl-navigation.
View details