Jakub Łącki
Jakub Łącki is a research scientist working on graph-mining and large-scale optimization teams. He received his PhD from Univeristy of Warsaw in 2015, advised by Piotr Sankowski. Before joining Google he was a postdoctoral researcher at Sapienza University of Rome, working with Stefano Leonardi.
Research Areas
Authored Publications
Sort By
Preview abstract
We study the problem of finding connected components in the Adaptive Massively Parallel Computation (AMPC) model. We show that when the total available space is linear in the size of the input graph the problem can be solved in O(log* n) rounds in forests (with high probability) and 2^O(log* n) expected rounds in general graphs. This improves upon an existing O(log log_(m/n) n) round algorithm.
For the case when the desired number of rounds is constant we show that both problems can be solved with only O(m + n log^(k) n) total space, where k is an arbitrarily large constant and log^(k) is the k-th iterate of the log2 function. This improves upon existing algorithms requiring Omega(m + n log n) total space.
View details
Preview abstract
We introduceTeraHAC, a (1+epsilon)-approximate hierarchical agglomerative clustering (HAC) algorithm whichs cales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+epsilon)-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of (1+epsilon)-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed.We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time.
View details
Preview abstract
Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a
(1+ϵ)-approximation algorithm for this problem on m edge graphs using O(m polylog m)
work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for exact average-linkage HAC is not possible under standard complexity-theoretic assumptions. We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. On a broad set of large publicly-available real-world datasets, we find that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm. We also show that ParHAC can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.
View details
Preview abstract
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient Õ(m) time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in Õ (n sqrt(m)) time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as m=n^(2−ϵ) for some constant ϵ>0. We complement this result with a simple ϵ-close approximation algorithm for average-linkage in our framework that runs in Õ (m) time. As an application of our algorithms, we consider clustering points in a metric space by first using k-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
View details
Preview abstract
Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth.
We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our
implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality
View details
Preview abstract
We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massively Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the Θ(log n) MPC round complexity barrier, where n is the number of vertices. Our techniques enable achieving this for PageRank—one of the most important applications of random walks—even in more challenging directed graphs, as well as for approximate bipartiteness and expansion testing.
In the undirected case, we start our random walks from the stationary distribution, which implies that we approximately know the empirical distribution of their next steps. This allows for preparing continuations of random walks in advance and applying a doubling approach. As a result we can generate multiple random walks of length l in Θ(log l) rounds on MPC. Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexity is asymptotically tight.
For directed graphs, our approach stems from our treatment of the PageRank Markov chain. We first compute the PageRank for the undirected version of the input graph and then slowly transition towards the directed case, considering convex combinations of the transition matrices in the process.
For PageRank, we achieve the following round complexities for damping factor equal to 1 − є: in O(log log n + log 1 / є) rounds for undirected graphs (with Õ(m / є^2) total space), in Õ(log^2 log n + log^2 1/є) rounds for directed graphs (with Õ((m+n^(1+o(1))) / poly(є)) total space). The round complexity of our result for computing PageRank has only logarithmic dependence on 1/є. We use this to show that our PageRank algorithm can be used to construct directed length-l random walks in O(log^2 log n + log^2 l) rounds with Õ((m+n^(1+o(1))) poly(l)) total space. More specifically, by setting є = Θ(1 / l), a length-l PageRank walk with constant probability contains no random jump, and hence is a directed random walk.
View details
Preview abstract
Classical single source shortest paths algorithms work by maintaining distance estimates d : V → ℝ and performing so-called edge relaxations. We call an edge uv of weight w(uv) relaxed if d(v) ≤ d(u) + w(uv), and tense otherwise. To relax a tense edge uv means to set d(v) to d(u)+w(uv). It is known that starting from d(s) = 0, and d(v) = ∞ for all v ≠ s, and performing edge relaxations in arbitrary order until there are no more tense edges leads to d being equal to the distances from the source s. This overall idea can be extended to a very simple incremental algorithm for maintaining shortest paths.
We consider an operation which can be seen as a dual of a relaxation and study an approximate version of both operations. We show that by repeating the respective operation until convergence one obtains very simple incremental and decremental deterministic algorithms for (1 + ∊)-approximate shortest paths in directed graphs.
Specifically, we give an algorithm maintaining all-pairs approximate shortest paths in O(n^3 log n log (nW)/∊) total update time, where the graph's edge weights come from the interval [1,W]. This is two log-factors faster than the known folklore solution obtained by combining King's decremental transitive closure algorithm [King, FOCS'99] and the h-SSSP algorithm [Bernstein, SICOMP'16] for h = 2.
In addition, we give an algorithm for approximating single source shortest paths of hop-length at most h in O(mh log(nW)/∊) total time. The obtained algorithm is simpler and more efficient than Bernstein's h-SSSP algorithm [Bernstein, SICOMP'16].
View details
Parallel Graph Algorithms in Constant Adaptive Rounds: Theory meets Practice
Soheil Behnezhad
Warren J Schudy
VLDB 2020
Preview abstract
We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table.
We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only O(n^ϵ) space per machine, where 0<ϵ<1. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph.
Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines.
View details
Preview abstract
Exploring large-scale networks is a time consuming and expensive task which is usually operated in a complex and uncertain environment. A crucial aspect of network exploration is the development of suitable strategies that decide which nodes and edges to probe at each stage of the process.
Adaptive strategies are dynamically updated to adjust to the current status of the exploration process. Nonadaptive oblivious strategies are also appealing since they greatly reduce the computation and communication costs required by adaptive strategies.
In order to model this process we introduce the \emph{stochastic graph exploration problem}. The input is an undirected graph $G=(V,E)$ with a source vertex $s$, stochastic edge costs for the edges drawn from a distribution $\pi_e, e\in E$, and rewards on vertices of maximum value $R$. The goal is to find a set $F$ of edges of total cost at most $B$ such that the subgraph of $G$ induced by $F$ is connected, contains $s$ and has maximum total reward. This problem generalizes the stochastic knapsack problem and other stochastic probing problems recently studied.
Our focus is on the development of efficient nonadaptive strategies with small adaptivity gap that are competitive against the optimal adaptive strategy. Unfortunately, we prove an $\Omega(n)$ adaptivity gap for the stochastic graph exploration problem even on a tree of $n$ vertices that compares against the $O(1)$ adaptivity gap of the stochastic knapsack problem. We circumvent this negative result by showing that $O(\log nR)$ resource augmentation suffices for an adaptivity gap $O(1)$ on trees and $O(\log nR)$ on general graphs. In order to achieve this result, we reduce stochastic graph exploration to a memoryless process -- the {\em minesweeper} problem, -- which assigns to every edge that is traversed a probability that the process would terminate. For this problem, interesting in its own, we present an optimal polynomial time algorithm on trees and a $O(\log nR)$ approximation for general graphs.
We also study the problem in which the maximum cost of an edge is a logarithmic fraction of the budget.
We show under this condition the existence of polynomial time oblivious strategies that use $1+\epsilon$ budget to obtain adaptivity gaps $1+\epsilon$ on trees and $8+\epsilon$ for general graphs. Finally, we provide additional results on the structure and the complexity of nonadaptive and adaptive strategies.
View details
Massively Parallel Computation via Remote Memory Access
Laxman Dhulipala
Soheil Behnezhad
Warren Schudy
SPAA 2019
Preview abstract
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the widely popular Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms using MapReduce and a distributed hash table service.
This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in O(1) rounds, and connectivity/minimum spanning tree in O(log log_{m/n} n) rounds, which is an exponential improvement upon the best known algorithms in the MPC model with sublinear space per machine. Our results imply that the 2-Cycle conjecture, the most popular hardness conjecture in the MPC model, does not hold in the AMPC model.
View details