Optimal Sketching for Trace Estimation

David P. Woodruff
Hai Pham
Shuli Jiang
Neurips 2021 (2021)
Google Scholar

Abstract

Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on a simplistic Hutchinson's method, which requires $O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve an $\epsilon$ additive error with failure probability $\delta$. Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from $O(1/\epsilon^2)$ to the optimal $O(1/\epsilon)$ on positive-semidefinite input matrices $A$, achieving a $(1\pm \epsilon)$-multiplicative approximation to the trace of $A$ with constant probability; however, in the high probability setting, the non-adaptive method suffers an extra $O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve $O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a $\log \log(1/\delta)$ factor, no further improvement in the dependence on $\delta$ or $\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over adaptive methods, which are not parallelizable, as well as over the standard non-adaptive Hutchinson's method.