Dynamic Factor Analysis for Sparse and Irregular Longitudinal Data: An Application to Metabolite Measurements in a COVID-19 Study.

Factor analysis (FA) can be used to identify key biomarkers in biological processes by assuming that latent biological pathways (statistically, "latent factors") drive the activity of measurable biomarkers ("observed variables"). However, biological pathways often interact, meaning that the classical FA assumption of independence between factors is questionable. Motivated by sparsely and irregularly collected longitudinal measurements of metabolites in a COVID-19 study, we propose a dynamic factor analysis model that accounts for cross-correlations between pathways via a multi-output Gaussian processes (MOGP) prior on the factor trajectories. To mitigate against overfitting caused by sparsity of longitudinal measurements, we introduce a roughness penalty upon MOGP hyperparameters and allow for non-zero mean functions. We also propose a scalable stochastic expectation maximization (StEM) algorithm that, in simulations, is both 20 times faster and provides more accurate and stable MOGP hyperparameter estimates than a previously-proposed Monte Carlo Expectation Maximization algorithm. In the motivating COVID-19 study, our methodology identifies a kynurenine pathway that affects the clinical severity of patients with COVID-19 disease and uncovers the role of the biomarker taurine. Our R package DFA4SIL implements the proposed method.
Chronic respiratory disease
Access
Care/Management
Advocacy

Authors

Cai Cai, Goudie Goudie, Tom Tom
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard