Journal of International Oncology ›› 2026, Vol. 53 ›› Issue (7): 412-419.doi: 10.3760/cma.j.cn371439-20251019-00057

• Original Article • Previous Articles     Next Articles

Predictive value of XGBoost model for pathological complete response after neoadjuvant chemotherapy in breast cancer patients

Liu Yonghong, Zhang Bo, Xue Lingbo, Hu Pengfei, Zhang Zhenyu, Li Jie()   

  1. Department of Thyroid and Breast SurgeryCangzhou Central Hospital of Hebei ProvinceCangzhou 061000, China
  • Received:2025-10-19 Online:2026-07-08 Published:2026-06-25
  • Contact: Li Jie E-mail:lj13513279709@hotmail.com
  • Supported by:
    Scientific and Technological Project of Cangzhou of China(222106087)

Abstract:

Objective To investigate the predictive value of extreme gradient boosting (XGBoost) model for pathological complete response (pCR) after neoadjuvant chemotherapy in breast cancer patients. Methods The clinical data of 172 breast cancer patients admitted to the Main Campus of Cangzhou Central Hospital of Hebei Province from January 2010 to December 2024 (internal dataset) and 41 patients admitted to the Branch Campus (external validation dataset) were retrospectively analyzed. The 172 patients were divided into an internal training dataset and an internal validation dataset at a ratio of 7∶3. The internal training dataset was used to build the XGBoost model, and the internal validation dataset was used for internal validation. The data of 41 patients of the external validation dataset were used for external validation. The influencing factors affecting pCR in breast cancer patients receiving neoadjuvant chemotherapy were screened by logistic regression analysis, and the area under the curve (AUC) of the XGBoost model for predicting pCR were analyzed by receiver operator characteristic (ROC) curve. A nomogram model was constructed based on the influencing factors identified by logistic regression analysis, and the differences of AUC between the XGBoost and the nomogram model were compared by DeLong test. The Shapley additive explanation (SHAP) scatter plot was applied for interpretable analysis on the XGBoost model. Results Among 172 breast cancer patients in internal dataset, 30 (17.4%) cases achieved pCR after neoadjuvant chemotherapy. There were statistically significant differences in the maximum tumor diameter (χ2=5.07, P=0.024), axillary lymph node status (χ2=10.85, P<0.001), human epidermal grouth factor receptor 2 (χ2=3.97, P=0.046), Ki-67 expression (χ2=5.50, P=0.019), neoadjuvant chemotherapy regimen (P=0.047), and targeted therapy (χ2=4.22, P=0.040) between the pCR group and non-pCR group. Multivariate analysis showed that the maximum tumor diameter (OR=3.32, 95%CI: 1.12-9.91, P=0.031), axillary lymph node status (OR=7.86, 95%CI: 1.83-33.63, P=0.005), Ki-67 expression (OR=4.84, 95%CI: 1.16-20.25, P=0.031), and targeted therapy (OR=0.11, 95%CI: 0.02-0.60, P=0.011) were independent influencing factors for pCR in breast cancer patients undergoing neoadjuvant chemotherapy. SHAP analysis showed that the variable importance of XGBoost model were axillary lymph node status, Ki-67 expression, the maximum tumor diameter, and targeted therapy. Axillary lymph node positivity was the most important risk factor for pCR in breast cancer patients undergoing neoadjuvant chemotherapy. The ROC curve analysis showed that in the internal training dataset, the AUC of XGBoost model for predicting pCR in breast cancer patients undergoing neoadjuvant chemotherapy was 0.84, while that of the nomogram model was 0.79 (Z=0.68, P=0.496). In the internal validation dataset, the AUC of XGBoost model was 0.75, and that of the nomogram model was 0.70 (Z=0.37, P=0.714). In the external validation dataset, the AUC of the XGBoost model was 0.81, and that of the nomogram model was 0.79 (Z=0.15, P=0.884). There were no statistically significant differences. Conclusions The XGBoost model based on axillary lymph node status, Ki-67 expression, the maximum tumor diameter and targeted therapy can effectively predict pCR after neoadjuvant chemotherapy in breast cancer patients.

Key words: Breast neoplasms, Neoadjuvant chemotherapy, Pathological condition, Neoplasm regression, Machine learning