Predicting enhancer-gene links from single-cell multi-omics data by integrating prior Hi-C information.

Enhancers play an important role in transcriptional regulation by modulating gene expression from distal genomic locations. Although single-cell ATAC and RNA sequencing (scATAC/RNA-seq) data have been leveraged to infer enhancer-gene links, establishing regulatory links between enhancers and their target genes remains a challenge due to the absence of chromatin conformation information. Here, we present SCEG-HiC, a machine learning method based on weighted graphical lasso, which decodes enhancer-gene links from single-cell multi-omics data by integrating bulk average Hi-C as prior knowledge. SCEG-HiC supports both paired scATAC/RNA-seq and scATAC-only inputs, improving prediction accuracy while retaining context-specific correlations and enabling the discovery of biologically relevant links. Comprehensive evaluation across 10 human and mouse single-cell multi-omics datasets shows that SCEG-HiC outperforms existing single-cell models. Application of SCEG-HiC to COVID-19 datasets illustrates its capacity to more reliably reconstruct gene regulatory networks underlying disease severity, and elucidate functional associations between noncoding variants and their putative target genes. SCEG-HiC is freely available as an open-source and user-friendly R package, facilitating broad applications in regulatory genomics research.
Chronic respiratory disease
Policy

Authors

Liang Liang, Miao Miao, Han Han, Li Li, Zhang Zhang, Wang Wang
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard