Eric Tune

Eric Tune

Eric Tune received his PhD in Computer Science from University of California at San Diego in 2004. His research focuses on Computer Architecture and Datacenter Computing.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Large-scale cluster management at {Google} with {Borg}
    Luis Pedrosa
    Madhukar R. Korupolu
    David Oppenheimer
    Proceedings of the European Conference on Computer Systems (EuroSys), ACM, Bordeaux, France (2015)
    Preview abstract Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior. We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it. View details
    {CPI^2}: {CPU} performance isolation for shared compute clusters
    Robert Hagmann
    Rohit Jnagal
    Vrigo Gokhale
    SIGOPS European Conference on Computer Systems (EuroSys), ACM, Prague, Czech Republic (2013), pp. 379-391
    Preview abstract Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other program's behavior. Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job. We have rolled out CPI2 to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues. View details
    Optimizing Google's Warehouse Scale Computers: The NUMA Experience
    Lingjia Tang
    Jason Mars
    Robert Hagmann
    The 19th IEEE International Symposium on High Performance Computer Architecture (2013)
    Preview
    Preview abstract Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application- platform affinity measurements and identification of platform-specific, microarchitectural peculiarities. View details