Rob Springer

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    A General Purpose Transpiler for Fully Homomorphic Encryption
    Shruthi Gorantala
    Sean Purser-Haskell
    Asra Ali
    Eric P. Astor
    Itai Zukerman
    Sam Ruth
    Phillipp Schoppmann
    Sasha Kulankhina
    Alain Forget
    David Marn
    Cameron Tew
    Rafael Misoczki
    Bernat Guillen
    Xinyu Ye
    Damien Desfontaines
    Aishe Krishnamurthy
    Miguel Guevara
    Yurii Sushko
    Google LLC (2021)
    Preview abstract Fully homomorphic encryption (FHE) is an encryption scheme which enables computation on encrypted data without revealing the underlying data. While there have been many advances in the field of FHE, developing programs using FHE still requires expertise in cryptography. In this white paper, we present a fully homomorphic encryption transpiler that allows developers to convert high-level code (e.g., C++) that works on unencrypted data into high-level code that operates on encrypted data. Thus, our transpiler makes transformations possible on encrypted data. Our transpiler builds on Google's open-source XLS SDK (https://github.com/google/xls) and uses an off-the-shelf FHE library, TFHE (https://tfhe.github.io/tfhe/), to perform low-level FHE operations. The transpiler design is modular, which means the underlying FHE library as well as the high-level input and output languages can vary. This modularity will help accelerate FHE research by providing an easy way to compare arbitrary programs in different FHE schemes side-by-side. We hope this lays the groundwork for eventual easy adoption of FHE by software developers. As a proof-of-concept, we are releasing an experimental transpiler (https://github.com/google/fully-homomorphic-encryption/tree/main/transpiler) as open-source software. View details
    Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild
    Danner Stodolsky
    Jeff Calow
    Jeremy Dorfman
    Clint Smullen
    Aki Kuusela
    Aaron James Laursen
    Alex Ramirez
    Alvin Adrian Wijaya
    Amir Salek
    Anna Cheung
    Ben Gelb
    Brian Fosco
    Cho Mon Kyaw
    Dake He
    David Alexander Munday
    David Wickeraad
    Devin Persaud
    Don Stark
    Drew Walton
    Elisha Indupalli
    Fong Lou
    Hon Kwan Wu
    In Suk Chong
    Indira Jayaram
    Jia Feng
    JP Maaninen
    Kyle Alan Lucke
    Maire Mahony
    Mark Steven Wachsler
    Mercedes Tan
    Narayana Penukonda
    Niranjani Dasharathi
    Poonacha Kongetira
    Prakash Chauhan
    Raghuraman Balasubramanian
    Ramon Macias
    Richard Ho
    Roy W Huffman
    Sandeep Bhatia
    Sarah J. Gwin
    Sathish K Sekar
    Srikanth Muroor
    Ville-Mikko Rautio
    Yolanda Ripley
    Yoshiaki Hase
    Yuan Li
    Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, NY, USA (2021), pp. 600-615
    Preview abstract Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and – with the slowing of Moore’s law – specialized hardware accelerators to deliver more computing at higher efficiencies. This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block – the video coding unit (VCU) – and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild" serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems. To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments. View details
    GPUCC - An Open-Source GPGPU Compiler
    Jingyue Wu
    Mark Heffernan
    Chris Leary
    Bjarke Roune
    Xuetian Weng
    Proceedings of the 2016 International Symposium on Code Generation and Optimization, ACM, New York, NY, pp. 105-116
    Preview abstract Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting the CUDA environment, hampering general compiler and architecture research and making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, fully open-source, CUDA compatible compiler for high performance computing. It performs various general and CUDA-specific optimizations to generate high performance code. The Clang-based frontend supports modern language features such as those in C++11 and C++14. Compile time is 8% faster than NVIDIA’s toolchain (nvcc) and it reduces compile time by up to 2.4x for pathological compilations (>100 secs), which tend to dominate build times in parallel build environments. Compared to nvcc, gpucc’s runtime performance is on par for several open-source benchmarks, such as Rodinia (0.8% faster), SHOC (0.5% slower), or Tensor (3.7% faster). It outperforms nvcc on internal large-scale end-to-end benchmarks by up to 51.0%, with a geometric mean of 22.9%. View details