国际肿瘤学杂志 ›› 2023, Vol. 50 ›› Issue (4): 220-226.doi: 10.3760/cma.j.cn371439-20221214-00043

• 论著 • 上一篇    下一篇

预测浆液性卵巢癌术后复发远处转移风险机器学习模型的构建

杨丽蓉, 王羽丰()   

  1. 昆明医科大学第三附属医院 云南省肿瘤医院老年肿瘤科,昆明 650100
  • 收稿日期:2022-12-14 修回日期:2023-03-13 出版日期:2023-04-08 发布日期:2023-06-12
  • 通讯作者: 王羽丰,Email: 13577037585@163.com

Construction of machine learning models for predicting the risk of postoperative distant metastasis recurrence in serous ovarian cancer

Yang Lirong, Wang Yufeng()   

  1. Department of Geriatric Oncology, Yunnan Cancer Hospital, Third Affiliated Hospital of Kunming Medical University, Kunming 650100, China
  • Received:2022-12-14 Revised:2023-03-13 Online:2023-04-08 Published:2023-06-12
  • Contact: Wang Yufeng, Email: 13577037585@163.com

摘要:

目的 利用常规临床数据开发浆液性卵巢癌(SOC)术后复发远处转移风险机器学习预测模型。方法 收集2010年1月至2020年12月在云南省肿瘤医院行手术治疗后复发的687例SOC患者为研究对象。根据复发状态将患者分为远处转移组(n=105)及非远处转移组(n=582)。采用logistic回归筛选SOC远处转移相关变量,运用K最近邻(KNN)、逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)和极限梯度提升(XGBoost)5种机器学习算法开发SOC术后复发远处转移风险预测模型。在模型验证方面,采用十折交叉验证方法进行内部验证。模型的性能使用受试者工作特征曲线评估。结果 远处转移组与非远处转移组患者国际妇产科联盟(FIGO)分期(Z=-3.81,P<0.001)、围手术期化疗周期(t=-5.11,P<0.001)、淋巴结转移(χ2=5.98,P=0.014)、腹腔积液细胞学(Z=-2.22,P=0.026)、新辅助化疗(χ2=5.29,P=0.021)差异均具有统计学意义。多因素分析结果显示,FIGO分期(OR=1.54,95%CI为1.07~2.22,P=0.019)和围手术期化疗周期(OR=1.22,95%CI为0.09~0.36,P<0.001)是SOC术后复发时发生远处转移的独立影响因素。腹腔积液细胞学(OR=1.20,95%CI为0.71~1.89,P=0.180)不是SOC远处转移的独立影响因素,结合文献观点将其纳入后可提高模型的曲线下面积(AUC),最终将其纳入模型的构建。基于上述3个变量构建的5个机器学习模型中,基于KNN构建的模型识别SOC远处转移的性能最佳,AUC为0.750、敏感性为0.591、特异性为0.786、准确率为85.0%;LR模型的AUC为0.679、敏感性为0.545、特异性为0.765、准确率为84.3%;SVM模型的AUC为0.634、敏感性0.240、特异性为0.968、准确率为84.7%;RF模型的AUC为0.575、敏感性0.905、特异性为0.245、准确率为84.7%;XGBoost模型的AUC为0.704、敏感性0.567、特异性为0.745、准确率为84.9%。结论 FIGO分期、围手术期化疗周期为SOC术后发生远处转移的独立影响因素;基于FIGO分期、围手术期化疗周期及腹腔积液细胞学构建的KNN模型预测SOC术后复发远处转移具有较高的区分度与准确率。

关键词: 卵巢肿瘤, 复发, 肿瘤转移, 机器学习, 危险因素

Abstract:

Objective To develop a machine model to predict the risk of postoperative distant metastasis recurrence in serous ovarian cancer (SOC) based on routine clinical data. Methods Participants included 687 patients with recurrent SOC who underwent surgery at Yunnan Cancer Hospital from January 2010 to December 2020. According to the recurrence status, the patients were divided into the distant metastasis group (n=105) and the non-distant metastasis group (n=582). Logistic regression was used to screen the variables related to distant metastasis of SOC. Based on these selected variables, five machine learning methods including K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector machine (SVM) and extreme gradient boosting (XGBoost) were used to develop the postoperative distant metastasis risk prediction model of SOC. For model validation, the 10-fold cross-validation method was used for internal validation. The performance of the models was evaluated using the receiver operating characteristic curve. Results There were statistically significant differences in International Federation of Gynecology and Obstetrics (FIGO) stage (Z=-3.81, P<0.001), perioperative chemotherapy cycle (t=-5.11, P<0.001), lymph node metastasis (χ2=5.98, P=0.014), peritoneal effusion cytology (Z=-2.22, P=0.026), and neoadjuvant chemotherapy (χ2=5.29, P=0.021) between patients in the distant metastasis group and the non-distant metastasis group. Multivariate regression analysis showed that the FIGO stage (OR=1.54, 95%CI: 1.07-2.22, P=0.019) and perioperative chemotherapy cycle (OR=1.22, 95%CI: 0.09-0.36, P<0.001) were independent influencing factors for postoperative distant metastasis recurrence in SOC. Peritoneal effusion cytology (OR=1.20, 95%CI: 0.71-1.89, P=0.180) was not an independent influencing factor for distant metastasis of SOC. It was ultimately included in the construction of the model, for its inclusion could improve the area under the curve (AUC) of the model. Among the five machine learning models constructed based on the above three variables, the KNN-based model had the best performance in identifying distant metastasis of SOC, with the AUC of 0.750, sensitivity of 0.591, specificity of 0.786, and accuracy of 85.0%. The AUC of the LR model was 0.679, the sensitivity was 0.545, the specificity was 0.765, and the accuracy was 84.3%. The AUC of SVM model was 0.634, the sensitivity was 0.240, the specificity was 0.968, and the accuracy was 84.7%. The AUC of RF model was 0.575, the sensitivity was 0.905, the specificity was 0.245, and the accuracy was 84.7%. The AUC of XGBoost model was 0.704, the sensitivity was 0.567, the specificity was 0.745, and the accuracy was 84.9%. Conclusion FIGO stage and perioperative chemotherapy cycle are independent influencing factors for postoperative distant metastasis recurrence in SOC. The KNN model established based on FIGO stage, perioperative chemotherapy cycle and peritoneal effusion cytology has high discrimination degree and accuracy rate in predicting postoperative distant metastasis recurrence of SOC.

Key words: Ovarian neoplasms, Recurrence, Neoplasm metastasis, Machine learning, Risk factors