Journal of International Oncology ›› 2025, Vol. 52 ›› Issue (1): 31-37.doi: 10.3760/cma.j.cn371439-20240806-00004

• Original Article • Previous Articles     Next Articles

A predictive model for radiation esophagitis in esophageal cancer patients based on machine learning

Gao Wei, Zhang Ling, Wu Tianlei, Hu Lili, Rong Feng()   

  1. Department of Radiotherapy, Tumor Center, Lu'an Hospital of Anhui Medical University, Lu'an 237000, China
  • Received:2024-08-06 Revised:2024-12-13 Online:2025-01-08 Published:2025-01-21
  • Contact: Rong Feng E-mail:wazhl1996@163.com
  • Supported by:
    Lu'an Science and Technology Plan(2022lakj042)

Abstract:

Objective To construct a predictive model of ≥ grade 2 radiation esophagitis (RE) in patients with esophageal cancer during concurrent radiochemotherapy (CRT) based on machine learning (ML) algorithm. Methods A retrospective analysis was conducted on the clinical data of 276 patients with esophageal cancer who had received CRT at Lu'an Hospital of Anhui Medical University from January 2018 to January 2023. The occurrence of RE was evaluated according to grading criteria of RE developed by American Radiation Therapy Oncology Group, with ≥ grade 2 RE as the outcome event. After screening variables through the least absolute shrinkage and selection operator (LASSO) regression, the dataset was re-established. The dataset was then divided into training set (n=193) and testing set (n=83) in a 7∶3 ratio and included in four ML models: random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost), and support vector machine (SVM). In the models, data training and model optimization were conducted in the training set, and model performance was evaluated in the testing set using the receiver operator characteristic (ROC) curve. The area under the curve (AUC), accuracy, precision, sensitivity, and F1 score were calculated to assess the model. SHAP analysis was used to explain the optimal model. Results By the end of follow-up, 91 cases (32.97%) of esophageal cancer patients had experienced ≥ grade 2 RE during CRT. There were statistically significant differences in tumor lesion length (Z=-5.53, P<0.001), Karnofsky performance status (KPS) score (χ²=5.92, P=0.015), the Eastern Cooperative Oncology Group (ECOG) score (χ²=4.01, P=0.045), hypertension (χ²=15.35, P<0.001), diabetes (χ²=13.06, P<0.001), white blood cell count (Z=-6.59, P<0.001), neutrophil count (Z=-6.72, P<0.001), and radiotherapy dose (χ²=9.81, P=0.002) between ≥ grade 2 RE occurrence group (n=91) and no occurrence group (n=185). After LASSO regression screening, 7 characteristic variables were ultimately selected, which were tumor lesion length, ECOG score, KPS score, neutrophil count, hypertension, diabetes, and radiotherapy dose. ROC curve analysis showed that the XGBoost model had better predictive performance, with an AUC of 0.90, accuracy of 0.82, precision of 0.80, sensitivity of 0.73, and F1 score of 0.76. The AUC, accuracy, precision, sensitivity, and F1 score of RF model were 0.89, 0.78, 0.76, 0.48, and 0.59, respectively. The AUC, accuracy, precision, sensitivity, and F1 score of DT model were 0.72, 0.72, 0.44, 0.60, and 0.52, respectively. The AUC of SVM model was 0.74, with an accuracy of 0.82, precision of 0.52, sensitivity of 0.88, and F1 score of 0.65. The XGBoost model was explained using SHAP analysis, which indicated that the tumor lesion length, neutrophil count, hypertension, diabetes, and radiotherapy dose had a strong predictive ability for the occurrence of ≥ grade 2 RE during CRT in esophageal cancer patients. Conclusions The model established based on the XGBoost method has good predictive performance for the occurrence of ≥ grade 2 RE in esophageal cancer patients during CRT. Meanwhile, combined with SHAP analysis, it can provide an intuitive understanding of the impact of important features in the model on the outcome.

Key words: Esophageal neoplasms, Chemoradiotherapy, Machine learning, Radiation esophagitis, SHAP analysis