[Non-invasive and high-precision identification of gastric precancerous lesions based on SERS and machine learning].
Objective: To construct a non-invasive detection platform based on surface enhanced Raman spectroscopy (SERS) combined with machine learning, achieving high-precision recognition of precancerous lesions of gastric cancer. Method: Serum samples were collected from 213 subjects at Jiangdu People's Hospital Affiliated to Yangzhou University from July 6, 2023 to January 1, 2025, including 51 healthy controls and 162 gastric lesion patients (48 cases of high-grade intraepithelial neoplasia [HGIN], 60 cases of early gastric cancer, and 54 cases of advanced gastric cancer). Au Octahedral Nanoparticles (Au OCNPs) substrates was synthesized by seed mediated method, and its morphology was characterized by scanning electron microscopy (SEM) and transmission electron microscopy (TEM). Serum samples were dropped onto the Au OCNPs array, and SERS spectra were acquired using a confocal micro-Raman spectrometer (excitation wavelength 785 nm, laser power 5 mW, exposure time 10 s). All original spectral data were preprocessed using Origin 2019 software, including spectral band selection, Savitzky-Golay smoothing, airPLS baseline correction, and Min-Max normalization. A principal component analysis-diagonal quadratic discriminant analysis (PCA-DQDA) model was constructed in MATLAB R2023a to evaluate the classification performance for healthy subjects and gastric lesion patients at different stages, and the model's accuracy and area under the curve (AUC) were validated by five-fold cross-validation. Results: The Au OCNPs arrays showed uniform morphology, sharp edges, a lattice spacing of 0.226 nm, and a characteristic absorption peak at 534 nm, with significant SERS enhancement and good reproducibility. The characteristic peak differences in SERS spectra between healthy subjects and gastric lesion patients were mainly concentrated at 625, 728, 1 006, 1 326, 1 446, and 1 584 cm-1, indicating significant differences in the vibrational modes of biomolecules such as proteins and nucleic acids in serum during the progression of gastric precancerous lesions. For the binary classification of healthy subjects and all gastric lesion patients, the PCA-DQDA model achieved an overall accuracy of 97.2% (207/213). For the multi-class classification of healthy subjects and gastric lesion patients at different stages, the model achieved an overall accuracy of 93.4% (199/213), and an AUC of 0.872. Misclassifications occurred between adjacent subgroups with similar biological characteristics: among 51 healthy subjects, 4 were misclassified as HGIN, with a classification accuracy of 92.2% (47/51); among 48 HGIN samples, 1 was misclassified as healthy and 2 as early gastric cancer, with a classification accuracy of 93.6% (45/48). Conclusion: The serum SERS detection platform based on Au OCNPs arrays and the PCA-DQDA model exhibits advantages of non-invasiveness, high sensitivity, and molecular specificity in identifying gastric precancerous lesions, providing a new strategy for their recognition.