Span Recovery for Deep Neural Networks with Applications to Input Obfuscation

David P. Woodruff
ICLR 2020 (2020)

Abstract

The tremendous success of deep neural networks has motivated the need to better understand the
fundamental properties of these networks, but many of the theoretical results proposed have only
been for shallow networks. In this paper, we study an important primitive for understanding the
meaningful input space of a deep network: span recovery. For k < n, let A ∈ R
k×n be the
innermost weight matrix of an arbitrary feed forward neural network M : R
n → R, so M(x) can be
written as M(x) = σ(Ax), for some network σ : R
k → R. The goal is then to recover the row span
of A given only oracle access to the value of M(x). We show that if M is a multi-layered network
with ReLU activation functions, then partial recovery is possible: namely, we can provably recover
k/2 linearly independent vectors in the row span of A using poly(n) non-adaptive queries to M(x).
Furthermore, if M has differentiable activation functions, we demonstrate that full span recovery is
possible even when the output is first passed through a sign or 0/1 thresholding function; in this case
our algorithm is adaptive. Empirically, we confirm that full span recovery is not always possible,
but only for unrealistically thin layers. For reasonably wide networks, we obtain full span recovery
on both random networks and networks trained on MNIST data. Furthermore, we demonstrate the
utility of span recovery as an attack by inducing neural networks to misclassify data obfuscated by
controlled random noise as sensical inputs.