Sequential regulatory activity prediction across chromosomes with convolutional neural networks

David Kelley
Yakir Reshef
Genome Research (2018)

Abstract

Functional genomics approaches to better model genotype-phenotype relationships have important
applications toward understanding genomic function and improving human health. In particular,
thousands of noncoding loci associated with diseases and physical traits lack mechanistic
explanation. Here, we develop the first machine-learning system to predict cell type-specific
epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone.
Using convolutional neural networks, this system identifies promoters and distal regulatory elements
and synthesizes their content to make effective gene expression predictions. We show that model
predictions for the influence of genomic variants on gene expression align well to causal variants
underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses
to enable GWAS loci fine mapping.