ATF-MGIAM: Medically-guided interpretable attention mapping for robust pertussis cough sound recognition.
To address the challenges of complex acoustic patterns and limited interpretability in pertussis cough sound recognition, this study proposes an interpretable deep learning framework based on adaptive time-frequency fusion and medically guided attention mechanisms. The framework first employs an Adaptive Time-Frequency Fusion Transformer to extract multi-scale temporal and spectral features of cough sounds, followed by a Medically-Guided Interpretable Attention Mapping module that aligns attention distributions with medically relevant acoustic features, achieving explicit interpretability in the diagnostic process. Experiments were conducted on three publicly available pertussis cough sound datasets from Kaggle, containing 68, 66, and 44 recordings, respectively. Under an 8:2 training-testing split with strict data leakage prevention, the proposed method achieved AUC scores of 0.994, 0.984, and 0.996 on the three datasets, outperforming the best existing baselines by an average of approximately 2%. Ablation studies demonstrated that the ATF module significantly enhances time-frequency dependency modeling, while the MGIAM module improves attention consistency in medically relevant regions. In noise robustness experiments, performance degradation remained below 3%, confirming the model's reliability and generalization ability in clinical applications. Overall, the proposed framework achieves unified high accuracy and strong interpretability for pertussis sound recognition, providing a reusable modeling paradigm for medical acoustic analysis.