Sample Complexity of Interventional Causal Representation Learning
Abstract
Consider a data-generation process that transforms low-dimensional latent causally related variables to high-dimensional observed variables. Causal representation learning (CRL) is the process of using the observed data to recover the latent causal variables and the causal structure among them. Despite the multitude of identifiability results under various interventional CRL settings, the existing guarantees apply exclusively to the infinite-sample regime (i.e., infinite observed samples). This paper establishes the first sample-complexity analysis for the finite-sample regime, in which the interactions between the number of observed samples and probabilistic guarantees on recovering the latent variables and structure are established. The focus of the paper is on general latent causal models, stochastic soft interventions, and a linear transformation from the latent to the observation space. The identifiability results in this paper ensure graph recovery up to ancestors and latent
variables recovery up to mixing with parent variables. Specifically, O((log 1/δ)^4)
samples suffice for latent graph recovery up to ancestors with probability 1−δ, and O(( 1/ϵ
log (1/δ)^4) samples suffice for latent causal variables recovery that is ϵ close to
the identifiability class with probability 1 − δ.
variables recovery up to mixing with parent variables. Specifically, O((log 1/δ)^4)
samples suffice for latent graph recovery up to ancestors with probability 1−δ, and O(( 1/ϵ
log (1/δ)^4) samples suffice for latent causal variables recovery that is ϵ close to
the identifiability class with probability 1 − δ.