Multimodal machine learning for distinguishing pediatric multiple sclerosis from non-inflammatory conditions using optical coherence tomography.
Identifying multiple sclerosis (MS) in children early is critical, as early therapeutic intervention can improve outcomes. The anterior visual pathway has been demonstrated to be of central importance in diagnostic considerations for MS and has recently been identified as a fifth topography in the McDonald Diagnostic Criteria for MS. Optical coherence tomography (OCT) provides high-resolution retinal imaging and reflects the structural integrity of the retinal nerve fiber and ganglion cell inner plexiform layers. Whether multimodal deep learning models can use OCT alone to diagnose pediatric onset MS (POMS) is unknown.
We analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB#1000005356). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g., ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion.
Scans from individuals with POMS (onset 16.0 ± 3.1 years, 51.0% female; 211 scans) and 29 children with non-inflammatory neurological conditions (13.1 ± 4.0 years, 69.0% female, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.90, weighted F 1: 0.87, macro F 1: 0.77, accuracy: 87%), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84, weighted F 1 of 0.85, macro F 1 of 0.73, and accuracy of 85%, while the best image-based model (ResNet101 with SVC) achieved an AUC of 0.79, weighted F 1 of 0.84, macro F 1 of 0.70, and accuracy of 87%. Late fusion underperformed, reaching 82% accuracy but failing in the minority class.
Multimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.
We analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB#1000005356). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g., ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion.
Scans from individuals with POMS (onset 16.0 ± 3.1 years, 51.0% female; 211 scans) and 29 children with non-inflammatory neurological conditions (13.1 ± 4.0 years, 69.0% female, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.90, weighted F 1: 0.87, macro F 1: 0.77, accuracy: 87%), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84, weighted F 1 of 0.85, macro F 1 of 0.73, and accuracy of 85%, while the best image-based model (ResNet101 with SVC) achieved an AUC of 0.79, weighted F 1 of 0.84, macro F 1 of 0.70, and accuracy of 87%. Late fusion underperformed, reaching 82% accuracy but failing in the minority class.
Multimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.
Authors
Chen Chen, Soltanieh Soltanieh, Rajapaksa Rajapaksa, Khalvati Khalvati, Yeh Yeh
View on Pubmed