Fast and accurate identification of emerging viral reassortment from genome sequences.
Segmented virus genomes, such as those of influenza A viruses (IAVs), consist of multiple segments. This structure enables the generation of novel strains through reassortment, where segments from different strains combine, greatly increasing the genetic diversity of segmented viruses. Reassortment can confer new biological properties to viruses, such as increased transmission rates, as evidenced by several influenza pandemics throughout history. Consequently, it is crucial to monitor reassortment events, particularly emerging ones. The vast availability of viral genome data provides an opportunity to identify emerging reassortment events. However, traditional methods often struggle with scalability due to high computational costs. This study introduces VReassort, a tool for fast and accurate detection of emerging reassortant strains using genome sequences. VReassort leverages deep learning models and phylogenetic-tree-derived features to detect reassortments between segment pairs. Experiments on simulated and real data demonstrate VReassort's superior performance (F1-score $\gt $ 0.8) across diverse scenarios. Remarkably, VReassort analyzed ~1000 IAV strains in under 2 min, achieving speeds over 100 times faster than the benchmark tool. Applying VReassort to large-scale IAV data ($\gt $8000 strains) uncovered intriguing reassortment patterns, while its application to rotavirus data confirmed the feasibility of extending VReassort to other segmented viruses.