Perception

We design systems that enable computers to "understand" the world, via a range of modalities including audio, image, and video understanding.

Woman looking at a book

We design systems that enable computers to "understand" the world, via a range of modalities including audio, image, and video understanding.

About the team

The Perception team is a group focused on building systems that can interpret sensory data such as image, sound, video, and more. Our research helps power many products across Google; image and video understanding in Search and Google Photos, computational photography for Pixel phones and Google Maps, machine learning APIs for Google Cloud and Youtube, accessibility technologies like Live Transcribe, applications in Nest Hub Max, mobile augmented reality experiences in Duo video calls and more.

We actively contribute to open source and research communities, providing media processing technologies (e.g. Mediapipe) to enable the building of computer vision applications with TensorFlow. Further, we have released several large-scale datasets for machine learning, including AudioSet, AVA, Open Images, and YouTube-8M.

In doing all this, we adhere to AI principles to ensure that these technologies work well for everyone. We value innovation, collaboration, respect, and building an inclusive and diverse team and research community, and we work closely with the PAIR team to build ML Fairness frameworks.

Featured publications

(Almost) Zero-Shot Cross-Lingual Spoken Language Understanding
Manaal Faruqui
Gokhan Tur
Dilek Hakkani-Tur
Larry Heck
Proceedings of the IEEE ICASSP (2018)
Aperture Supervision for Monocular Depth Estimation
Pratul Srinivasan
Rahul Garg
Neal Wadhwa
Ren Ng
CVPR (2018) (to appear)
BLADE: Filter Learning for General Purpose Image Processing
John Isidoro
Sungjoon Choi
Frank Ong
International Conference on Computational Photography (2018)
Burst Denoising with Kernel Prediction Networks
Ben Mildenhall
Jiawen Chen
Dillon Sharlet
Ren Ng
Rob Carroll
CVPR (2018) (to appear)
COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar
Vittorio Ferrari
CVPR (2018) (to appear)
Decoding the auditory brain with canonical component analysis
Alain de Cheveigné
Daniel D. E. Wong
Giovanni M. Di Liberto
Jens Hjortkjaer
Malcolm Slaney
Edmund Lalor
NeuroImage (2018)

Highlighted projects