国际肿瘤学杂志 ›› 2020, Vol. 47 ›› Issue (4): 211-216.doi: 10.3760/cma.j.cn371439-20190923-00004

• 论著 • 上一篇    下一篇

基于TCGA数据库初步筛选预测胃癌生存期的基因

邹文静1, 和水祥2, 刘丹1, 李旭3()   

  1. 1 西安市第五医院老年病科 710082
    2 西安交通大学第一附属医院消化内科 710061
    3 陕西省肿瘤医院肿瘤内科,西安 710061
  • 收稿日期:2019-09-23 修回日期:2019-12-30 出版日期:2020-04-08 发布日期:2020-05-26
  • 通讯作者: 李旭 E-mail:765203999@qq.com
  • 基金资助:
    陕西省自然科学基金(2015JM8394)

Screening differential genes and prognostic analysis of gastric cancer based on TCGA database

Zou Wenjing1, He Shuixiang2, Liu Dan1, Li Xu3()   

  1. 1 Department of Gerontology, Xi'an No.5 Hospital, Xi'an 710082, China
    2 Department of Gastroenterology, First Affiliated Hospital of Xi'an JiaoTong Univrsity, Xi'an 710061, China
    3 Department of Oncology, Shaanxi Provincial Cancer Hospital, Xi'an 710061, China
  • Received:2019-09-23 Revised:2019-12-30 Online:2020-04-08 Published:2020-05-26
  • Contact: Li Xu E-mail:765203999@qq.com
  • Supported by:
    Natural Science Foundation of Shannxi Province of China(2015JM8394)

摘要:

目的 利用癌症基因组图谱(TCGA)中的大量胃癌基因组数据,在胃癌组织差异表达的基因中挖掘与预后相关的基因。方法 在TCGA数据库中下载胃腺癌相关基因芯片数据,经R语言数据预处理及用edgeR对基因表达数据进行差异表达分析,利用R语言对差异基因进行基因本体论(GO)富集及KEGG生物通路分析。多因素逐步回归Cox分析预测影响生存期的基因,利用Kaplan-Meier Plotter(http://Kaplan-Meier Plotter.com)网站对上述得到的基因进行在线生存分析。结果 TCGA数据库中共筛选胃癌标本305个,癌旁组织30个。得到3 231个胃癌差异基因,其中上调2 005个基因,下调1 226个基因。GO富集主要集中于抗原连接、丝氨酸水解酶活性、受体配体活性、丝氨酸型肽酶活性、丝氨酸型内肽酶活性、糖胺聚糖结合、细胞因子活性、激素活性、肽酶抑制剂活性、金属钛酶活性等分子功能。KEGG生物通路分析主要涉及化学致癌物、神经活性受体-配体相互作用、细胞因子-细胞因子受体相互作用、细胞色素P450对有害物质的代谢、蛋白质的消化与吸收、金黄色葡萄球菌感染、视黄醇代谢、药物代谢P450、类固醇激素生物代谢、胰液分泌等。Cox分析显示,基因GPX3和SERPINE1对胃癌患者生存期有显著影响。受试者工作特征曲线分析显示,GPX3和SERPINE1表达量的高低对胃癌患者生存期有一定的预测价值,二者临界值分别为0.46、0.68时,敏感性为60.35%,特异性为82.06%,曲线下面积为0.763(95%CI为0.828~0.936)。Kaplan-Meier分析发现,GPX3(P<0.001)和SERPINE1基因(P=0.001)高表达与胃腺癌不良预后有明显关系。结论 SERPINE1、GPX3基因表达越高,胃癌患者生存期越短,二者可能作为胃癌预测预后的靶点。

关键词: 胃肿瘤, 原癌基因, 预后, 基因本体

Abstract:

Objective To extract the genes associated with prognosis from the differential expressed genes in gastric cancer tissues by using a large number of gastric cancer genome data in the cancer genome atlas (TCGA) database. Methods Gene expression data of gastric adenocarcinoma were downloaded from TCGA database. After R language data preprocessing, edgeR was used to analyze the gene differential expression, and R language was used to identify the significant gene ontology (GO) terms and KEGG pathways in gene differential expression. Multivariate Cox stepwise regression analysis was used to predict the genes that affected survival. Genes obtained above were used for survival analysis online in Kaplan-Meier Plotter website (http://Kaplan-Meier Plotter.com). Results A total of 305 gastric cancer and 30 normal gastric tissues were retrieved in TCGA database, and 3 231 differential genes were screened out, including 2 005 up-regulated genes and 1 226 down-regulated genes. These genes were enriched in GO terms including antigen binding, serine hydrolase activity, receptor ligands activity, serine peptidase activity, serine type endopeptidase activity, glycosaminoglycans binding, cytokine activity, hormone activity, peptidase inhibitor activity, metallopeptidase activity and so on. The genes in KEGG pathway analysis were enriched in chemical carcinogen, neuractive receptor-ligand interaction, cytokine-cytokine receptor interaction, metabolism of xenobiotics by cytochrome P450, protein digestion and absorption, staphylococcus aureus infection, retinol metabolism, drug metabolism P450, steroid hormone metabolism, pancreatic secretion and so on. Cox analysis showed that GPX3 and SERPINE1 had significant effect on the survival of gastric cancer patients. Receiver operating characteristic curve analysis showed that the expressions of GPX3 and SERPINE1 had a certain predictive value for the survival time of gastric cancer patients, when the critical values of GPX3 and SERPINE1 were 0.46 and 0.68 respectively, the sensitivity was 60.35%, the specificity was 82.06%, and the area under the curve was 0.763 (95%CI: 0.828-0.936). Kaplan-Meier analysis showed that the high expressions of GPX3 (P<0.001) and SERPINE1 (P=0.001) were significantly related to the poor prognosis of gastric adenocarcinoma. Conclusion The higher expression of SERPINE1 and GPX3 genes, the shorter survival time of gastric cancer patients. They may be the targets for predicting the prognosis of gastric cancer.

Key words: Stomach neoplasms, Proto-oncogenes, Prognosis, Gene ontology