Journal of International Oncology ›› 2026, Vol. 53 ›› Issue (3): 144-149.doi: 10.3760/cma.j.cn371439-20250415-00023

• Original Article • Previous Articles     Next Articles

Evaluation of the risk of low-blood-flow BI-RADS category 4 breast lesions with an ultrasound-based XGBoost model

He Yuqing(), Wu Zizheng, Qi Zhengqin   

  1. Department of Ultrasound,First Hospital of Qinhuangdao,Qinhuangdao 066000,China
  • Received:2025-04-15 Online:2026-03-08 Published:2026-02-09
  • Contact: He Yuqing E-mail:347263253@qq.com
  • Supported by:
    Hebei Provincial Medical Science Research Project(20231893);Qinhuangdao Science and Technology Research and Development Program(202301A199)

Abstract:

Objective To develop an extreme gradient boosting (XGBoost) model based on clinical and ultrasound features,and to evaluate the model's prediction of the malignancy risk of low-blood-flow (Adler grade 0 -Ⅰ) breast imaging-reporting and data system (BI-RADS) category 4 breast lesions. Methods Clinical and ultrasound data from 317 female patients diagnosed with BI-RADS category 4 breast lesions at First Hospital of Qinhuangdao from June 2023 to December 2024 were retrospectively collected (full-sample,174 benign,143 malignant). Patients were divided into a training set (n=222,122 benign,100 malignant) and a testing set (n=95,52 benign,43 malignant) using a 7∶3 stratified random sampling method. After excluding patients with high blood flow grades (Adler grade Ⅱ-Ⅲ),166 patients with low blood flow grades were collected and divided 7∶3 into training (n=116,71 benign,45 malignant) and testing (n=50,30 benign,20 malignant) sets. A full-sample XGBoost model for predicting the benign and malignant nature of BI-RADS category 4 breast lesions was constructed based on the well-defined epidemiological risk factors for breast cancer (age,family history of breast cancer,obesity,history of alcohol consumption,and smoking history) and the core assessment indicators for breast lesions recommended by the 2013 ACR BI-RADS classification standard (blood flow grade,maximum lesion diameter,microcalcification,shape,margin,internal echo,posterior echo,and parallel position). After excluding the blood flow grade variable,a low-blood-flow grade XGBoost model was constructed with the remaining 12 features. The predictive efficacy was evaluated using receiver operator characteristic (ROC) curves; SHapley additive explanation (SHAP) analysis was used to quantify feature contributions; decision curve analysis (DCA) was used to assess accuracy and practicability. Results There were statistically significant differences among patients with benign and malignant breast lesions in the full sample for blood flow grade (χ²=4.99,P=0.026),maximum lesion diameter (χ²=4.47,P=0.034),microcalcifications (χ²=7.10,P=0.009),internal echo (χ²=4.24,P=0.041),and posterior echo (χ²=22.32,P<0.001). ROC curve analysis showed that,for the full-sample training set,the area under the curve (AUC) of the XGBoost model for predicting benign and malignant BI-RADS category 4 breast lesions was 0.936 (95%CI: 0.902-0.965),with an accuracy of 86.0%,a sensitivity of 88.5%,and a specificity of 83.2%; for the testing set,the AUC was 0.852 (95%CI: 0.787-0.906),with an accuracy of 76.8%,a sensitivity of 78.6%,and a specificity of 75.0%. SHAP analysis showed that,the blood flow grade (Adler gradesⅡ-Ⅲ) had the greatest contribution to the prediction of malignancy risk by the XGBoost model for the full sample,followed by the irregularity of the margin and the absence of parallel position. For the low-blood-flow grade sample training set,the AUC of the XGBoost model for predicting benign and malignant BI-RADS category 4 breast lesions was 0.951 (95%CI: 0.917-0.975),with an accuracy of 86.5%,a sensitivity of 87.9%,and a specificity of 84.8%; for the testing set,the AUC was 0.843 (95%CI: 0.766-0.904),with an accuracy of 79.6%,a sensitivity of 81.5%,and a specificity of 77.8%. Internal validation results showed that the C-index of the XGBoost model for predicting benign and malignant breast lesions was 0.82. SHAP analysis showed that,the posterior echo attenuation had the greatest positive contribution to the prediction of malignancy risk by the XGBoost model for low-blood-flow grade samples,followed by the presence of microcalcification,maximum lesion diameter >2 cm,and inhomogeneous internal echo. DCA showed that this prediction model could provide high clinical net benefit and had certain clinical practicability. Conclusions The XGBoost model based on clinical and ultrasound features effectively evaluates benign and malignant nature of low-blood-flow BI-RADS category 4 breast lesions. Posterior echo attenuation,microcalcification,maximum lesion diameter >2 cm,and inhomogeneous internal echo are key features for predicting malignancy risk in low-blood-flow BI-RADS category 4 breast lesions.

Key words: Breast diseases, Ultrasonography, Artificial intelligence, XGBoost, BI-RADS