Nathan Coehlo

Nathan Coehlo

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Janus: Optimal Flash Provisioning for Cloud Storage Workloads
    Murray Stokely
    Muhammad Waliji
    Francois Labelle
    Xudong Shi
    Eric Schrock
    Proceedings of the USENIX Annual Technical Conference, USENIX, Advanced Computing System Association, 2560 Ninth Street, Suite 215, Berkeley, CA 94710, USA (2013), pp. 91-102
    Preview abstract Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) policy or a Least-Recently-Used (LRU) policy, subject to per-workload allocations. Janus constructs compact metrics of the cacheability of the different workloads, using sampled distributed traces because of the large scale of the system. From these metrics, we formulate and solve an optimization problem to determine the flash allocation to workloads that maximizes the total reads sent to the flash tier, subject to operator-set priorities and bounds on flash write rates. Using measurements from production workloads in multiple data centers using these recommendations, as well as traces of other production workloads, we show that the resulting allocation improves the flash hit rate by 47–76% compared to a unified tier shared by all workloads. Based on these results and an analysis of several thousand production workloads, we conclude that flash storage is a cost-effective complement to disks in data centers. View details
    Uncertainty in Aggregate Estimates from Sampled Distributed Traces
    Murray Stokely
    2012 Workshop on Managing Systems Automatically and Dynamically, USENIX
    Preview abstract Tracing mechanisms in distributed systems give important insight into system properties and are usually sampled to control overhead. At Google, Dapper [8] is the always-on system for distributed tracing and performance analysis, and it samples fractions of all RPC traffic. Due to difficult implementation, excessive data volume, or a lack of perfect foresight, there are times when system quantities of interest have not been measured directly, and Dapper samples can be aggregated to estimate those quantities in the short or long term. Here we find unbiased variance estimates of linear statistics over RPCs, taking into account all layers of sampling that occur in Dapper, and allowing us to quantify the sampling uncertainty in the aggregate estimates. We apply this methodology to the problem of assigning jobs and data to Google datacenters, using estimates of the resulting cross-datacenter traffic as an optimization criterion, and also to the detection of change points in access patterns to certain data partitions. View details