Efficient Spatial Sampling of Large Geographical Tables

Anish Das Sarma
Hector Gonzalez
Alon Y. Halevy
SIGMOD (2012)

Abstract

Large-scale map visualization systems play an increasingly important role in presenting geographic datasets to end users. Since these datasets can be
extremely large, a map rendering system often needs to select a small
fraction of the data to visualize them in a limited space. This paper addresses the fundamental challenge of {\em thinning}:
determining appropriate samples of data to be shown on specific geographical
regions and zoom levels. Other than the sheer scale of the data, the thinning
problem is challenging because of a number of other reasons: (1) data can
consist of complex geographical shapes, (2) rendering of data needs to satisfy
certain constraints, such as data being preserved across zoom levels and
adjacent regions, and (3) after satisfying the constraints, an {\em optimal}
solution needs to be chosen based on {\em objectives} such as {\em
maximality}, {\em fairness}, and {\em importance} of data.

This paper formally defines and presents a complete solution to the thinning
problem. First, we express the problem as an integer programming formulation
that efficiently solves thinning for desired objectives. Second, we
present more efficient solutions for maximality, based on DFS traversal of a
spatial tree. Third, we consider the common special case of point datasets,
and present an even more efficient randomized algorithm. Finally, we have
implemented all techniques from this paper in Google Maps visualizations of
Fusion Tables, and we describe a set of experiments that demonstrate the
tradeoffs among the algorithms.

Research Areas