南方医科大学学报 ›› 2026, Vol. 46 ›› Issue (2): 353-361.doi: 10.12122/j.issn.1673-4254.2026.02.13
• • 上一篇
陈君尧1(
), 陈泽宇2(
), 林钊杰1, 方梦浩1, 沈超英3, 许琦4, 张晓怡5, 卢鲁1(
)
收稿日期:2025-06-26
出版日期:2026-02-20
发布日期:2026-03-10
通讯作者:
卢鲁
E-mail:cjy2300@tju.edu.cn;380893842@qq.com;Lulu_998543@tju.edu.cn
作者简介:陈君尧,在读硕士研究生,E-mail: cjy2300@tju.edu.cn基金资助:
Junyao CHEN1(
), Zeyu CHEN2(
), Zhaojie LIN1, Menghao FANG1, Chaoying SHEN3, Qi XU4, Xiaoyi ZHANG5, Lu LU1(
)
Received:2025-06-26
Online:2026-02-20
Published:2026-03-10
Contact:
Lu LU
E-mail:cjy2300@tju.edu.cn;380893842@qq.com;Lulu_998543@tju.edu.cn
摘要:
目的 探究饮茶、饮酒等生活习惯对胃肠疾病的影响与关联关系,并以此建立胃肠疾病早期风险预测和辅助问诊大模型,实现早期胃肠道疾病高风险识别与智能诊疗推荐。 方法 对安溪县中医院消化内镜中心同时进行胃镜检查与13C尿素呼气试验的患者进行调查研究。先对数据进行标量分析,以确定特征选取是否合适。将数据按7∶3随机分为训练集和测试集,应用人工智能和机器学习方法:支持向量机、K近邻、逻辑回归、随机森林极限梯度提升和深度神经网络(DNN),以寻找最佳分类器用于预测胃肠道高风险病症。使用贝叶斯优化算法,获取6种模型的最优超参数组合,进行模型拟合,并应用Shapley加法解释方法对最佳模型进行可解释性分析。选用DeepSeek-R1为基础语言模型,利用胃肠病与中文医疗在线问诊数据集进行参数微调,构建更契合临床实际需求的胃肠疾病问诊大语言模型。 结果 本次调查纳入502人,所选取特征与肠胃疾病均具有一定关联性,但仅有年龄特征与胃肠道疾病呈线性相关性(β=0.023,SE=0.008,t=2.942,P=0.003)。最佳模型为DNN模型,其准确率为0.68、精确性为0.68、召回率为0.85、F1值为0.75、AUC为0.74。基于DNN模型进行特征重要性排序,前3名为年龄、DOB值和烟龄;构建大语言模型与实际专业医师根据胃镜检查结果所提供的建议高度一致。 结论 基于DNN机器学习方法构建的胃肠疾病风险预测模型最佳,可为临床胃肠道疾病进行风险预测并对是否进行胃肠镜检查提供可靠依据,同时表明预防胃肠道疾病发生需禁烟少酒、合理饮茶。构建的肠胃问诊大语言模型可为患者提供更为专业的医学指导,具有较强的临床实用价值。
陈君尧, 陈泽宇, 林钊杰, 方梦浩, 沈超英, 许琦, 张晓怡, 卢鲁. 饮茶对胃肠道疾病风险的双重作用:基于可解释机器学习与大语言模型的联合预测辅助模型[J]. 南方医科大学学报, 2026, 46(2): 353-361.
Junyao CHEN, Zeyu CHEN, Zhaojie LIN, Menghao FANG, Chaoying SHEN, Qi XU, Xiaoyi ZHANG, Lu LU. Dual role of tea consumption in gastrointestinal disease risks: analysis using a risk prediction model integrating interpretable machine learning and large language model[J]. Journal of Southern Medical University, 2026, 46(2): 353-361.
| Features | Assignment |
|---|---|
| Duration of smoking | <3 years=1,3-5 years=3,5-7 years=5,7-10 years=7,>10 years=10 |
| Duration of tea drinking | <3 years=1,3-5 years=3,5-7 years=5,7-10 years=7,>10 years=10 |
| Frequency of tea drinking | Once a week=1,2-3 times a week=2,4-6 times a week=6,Once a day=7,Several times a day =10 |
表1 部分数据预处理特征赋值
Tab.1 Assignment of a part of data preprocessing features
| Features | Assignment |
|---|---|
| Duration of smoking | <3 years=1,3-5 years=3,5-7 years=5,7-10 years=7,>10 years=10 |
| Duration of tea drinking | <3 years=1,3-5 years=3,5-7 years=5,7-10 years=7,>10 years=10 |
| Frequency of tea drinking | Once a week=1,2-3 times a week=2,4-6 times a week=6,Once a day=7,Several times a day =10 |
| Disease name | Gastroscopy findings | Common symptoms | Medical recommendations |
|---|---|---|---|
| Gastric antral ulcer (Stage A2) | The gastric antral ulcer shows a thinner coating, with reduced hyperemia and edema in the surrounding mucosa, and clearer margins compared to stage A1. | Postprandial abdominal pain has somewhat subsided,but symptoms of indigestion persist. | Diet and schedule remain the same as in Stage A1; Continue medication therapy, with dosage adjustments as needed; Surgery is generally not required, but the condition should be closely monitored. |
| Gastric antral ulcer (Stage H1) | The gastric antral ulcer has shrunk and become shallower, with a thin coating. Inflammation of the surrounding mucosa has subsided, and regenerative epithelium is present. | Postprandial abdominal pain is significantly reduced, and symptoms of indigestion are improved. | Dietary intake may be appropriately diversified; maintain regular schedule; continue medication as prescribed, with adjustments based on condition; surgery is generally not required. |
| Gastric antral ulcer (Stage S1) | The gastric antral ulcer has healed, leaving a red scar. | No obvious symptoms, with occasional dull pain in the upper abdomen. | Maintain a regular diet and eating schedule; adhere to a consistent sleep routine; medication may be discontinued for observation, with periodic endoscopic follow-ups; surgery is generally not required. |
表2 自建胃病数据集
Tab.2 Self-constructed gastric disease dataset
| Disease name | Gastroscopy findings | Common symptoms | Medical recommendations |
|---|---|---|---|
| Gastric antral ulcer (Stage A2) | The gastric antral ulcer shows a thinner coating, with reduced hyperemia and edema in the surrounding mucosa, and clearer margins compared to stage A1. | Postprandial abdominal pain has somewhat subsided,but symptoms of indigestion persist. | Diet and schedule remain the same as in Stage A1; Continue medication therapy, with dosage adjustments as needed; Surgery is generally not required, but the condition should be closely monitored. |
| Gastric antral ulcer (Stage H1) | The gastric antral ulcer has shrunk and become shallower, with a thin coating. Inflammation of the surrounding mucosa has subsided, and regenerative epithelium is present. | Postprandial abdominal pain is significantly reduced, and symptoms of indigestion are improved. | Dietary intake may be appropriately diversified; maintain regular schedule; continue medication as prescribed, with adjustments based on condition; surgery is generally not required. |
| Gastric antral ulcer (Stage S1) | The gastric antral ulcer has healed, leaving a red scar. | No obvious symptoms, with occasional dull pain in the upper abdomen. | Maintain a regular diet and eating schedule; adhere to a consistent sleep routine; medication may be discontinued for observation, with periodic endoscopic follow-ups; surgery is generally not required. |
| Query | Response |
|---|---|
| My husband has always had a weak stomach and is prone to diarrhea. Lately, he experiences stomach pain within half an hour of eating dinner every night, followed by loose stools. I'd like to ask whether he should get checked at the hospital or if it's okay to just buy some medicine for him to take. | Hello, based on the symptoms you described, your husband may be experiencing gastrointestinal issues. We recommend first observing the duration and severity of the symptoms. If they persist for an extended period or become severe, it is advisable to seek medical attention promptly for relevant examinations to rule out serious conditions. If symptoms are mild, you may try self-care measures such as avoiding spicy or irritating foods, drinking plenty of water, and increasing intake of fruits and vegetables. You may also purchase over-the-counter antacids or anti-inflammatory medications from a pharmacy, but always follow the instructions carefully. If symptoms persist or worsen, please seek medical attention promptly. |
| When I wake up in the morning, I have to cough for a while. Lately, there's been blood in the phlegm I cough up! | Hello, this situation requires prompt medical attention. Coughing up blood-tinged sputum may indicate conditions affecting the respiratory or digestive tracts. It is advisable to seek medical evaluation as soon as possible to determine the cause and receive timely treatment. While awaiting medical care, ensure adequate rest, avoid smoking and exposure to pollutants, maintain clean indoor air, drink plenty of water, and adhere to a healthy diet. |
表3 公开卫生健康指导数据集
Tab.3 Open health guidance dataset
| Query | Response |
|---|---|
| My husband has always had a weak stomach and is prone to diarrhea. Lately, he experiences stomach pain within half an hour of eating dinner every night, followed by loose stools. I'd like to ask whether he should get checked at the hospital or if it's okay to just buy some medicine for him to take. | Hello, based on the symptoms you described, your husband may be experiencing gastrointestinal issues. We recommend first observing the duration and severity of the symptoms. If they persist for an extended period or become severe, it is advisable to seek medical attention promptly for relevant examinations to rule out serious conditions. If symptoms are mild, you may try self-care measures such as avoiding spicy or irritating foods, drinking plenty of water, and increasing intake of fruits and vegetables. You may also purchase over-the-counter antacids or anti-inflammatory medications from a pharmacy, but always follow the instructions carefully. If symptoms persist or worsen, please seek medical attention promptly. |
| When I wake up in the morning, I have to cough for a while. Lately, there's been blood in the phlegm I cough up! | Hello, this situation requires prompt medical attention. Coughing up blood-tinged sputum may indicate conditions affecting the respiratory or digestive tracts. It is advisable to seek medical evaluation as soon as possible to determine the cause and receive timely treatment. While awaiting medical care, ensure adequate rest, avoid smoking and exposure to pollutants, maintain clean indoor air, drink plenty of water, and adhere to a healthy diet. |
图2 相关性连接结果图
Fig.2 Result of correlation connection. D-S: Duration of smoking; D-T: Duration of tea drinking; F-T: Frequency of tea drinking.Blue edge: Negative correlation; Red edge: Positive correlation.
| Model | Accuracy | Precision | Recall | F1 Score | Brier Score | AUC |
|---|---|---|---|---|---|---|
| SVM | 0.67±0.046 | 0.68±0.058 | 0.80±0.081 | 0.74±0.047 | 0.23±0.011 | 0.68±0.048 |
| KNN | 0.60±0.046 | 0.64±0.030 | 0.69±0.083 | 0.66±0.053 | 0.25±0.020 | 0.66±0.041 |
| LR | 0.69±0.039 | 0.68±0.036 | 0.85±0.061 | 0.76±0.037 | 0.21±0.017 | 0.74±0.063 |
| RF | 0.58±0.033 | 0.61±0.036 | 0.73±0.037 | 0.66±0.027 | 0.25±0.012 | 0.63±0.030 |
| XGB | 0.58±0.030 | 0.63±0.016 | 0.68±0.077 | 0.65±0.044 | 0.24±0.016 | 0.63±0.039 |
| DNN | 0.68±0.047 | 0.68±0.038 | 0.85±0.087 | 0.75±0.052 | 0.21±0.016 | 0.74±0.053 |
表4 机器学习模型预测性能比较
Tab.4 Comparison of machine learning model prediction performance
| Model | Accuracy | Precision | Recall | F1 Score | Brier Score | AUC |
|---|---|---|---|---|---|---|
| SVM | 0.67±0.046 | 0.68±0.058 | 0.80±0.081 | 0.74±0.047 | 0.23±0.011 | 0.68±0.048 |
| KNN | 0.60±0.046 | 0.64±0.030 | 0.69±0.083 | 0.66±0.053 | 0.25±0.020 | 0.66±0.041 |
| LR | 0.69±0.039 | 0.68±0.036 | 0.85±0.061 | 0.76±0.037 | 0.21±0.017 | 0.74±0.063 |
| RF | 0.58±0.033 | 0.61±0.036 | 0.73±0.037 | 0.66±0.027 | 0.25±0.012 | 0.63±0.030 |
| XGB | 0.58±0.030 | 0.63±0.016 | 0.68±0.077 | 0.65±0.044 | 0.24±0.016 | 0.63±0.039 |
| DNN | 0.68±0.047 | 0.68±0.038 | 0.85±0.087 | 0.75±0.052 | 0.21±0.016 | 0.74±0.053 |
| [1] | Pellicano R, Ianiro G, Fagoonee S, et al. Review: extragastric diseases and Helicobacter pylori [J]. Helicobacter, 2020, 25(): e12741. doi:10.1111/hel.12741 |
| [2] | Bashir SK, Khan MB. Overview of Helicobacter pylori infection, prevalence, risk factors, and its prevention[J]. Adv Gut Microbiome Res, 2023, 2023: 9747027. doi:10.1155/2023/9747027 |
| [3] | Duan YT, Xu YH, Dou Y, et al. Helicobacter pylori and gastric cancer: mechanisms and new perspectives[J]. J Hematol Oncol, 2025, 18(1): 10. doi:10.1186/s13045-024-01654-2 |
| [4] | Ogihara A, Kikuchi S, Hasegawa A, et al. Relationship between Helicobacter pylori infection and smoking and drinking habits[J]. J Gastroenterol Hepatol, 2000, 15(3): 271-6. doi:10.1046/j.1440-1746.2000.02077.x |
| [5] | Xue F, Xue J, Zhao B, et al. The associations of tobacco, alcohol, and coffee consumption with upper and lower gastrointestinal disease risk: a mendelian randomization study[J]. Gut Liver, 2025, 19(5): 715-24. doi:10.5009/gnl240440 |
| [6] | 储思远, 钱利生, 陈海敏. 茶成分对肠道菌群的调控作用及其健康效应的研究进展 [J]. 天然产物研究与开发, 2024, 36(02): 357-67. |
| [7] | Yu X, Deng H, Xiong Z, et al. A scale to measure the worry level in Gastrointestinal Endoscopy with sedation: Development, reliability, and validity[J]. Int J Clin Health Psychol, 2023, 23(4): 100410. doi:10.1016/j.ijchp.2023.100410 |
| [8] | Cox DR. The regression analysis of binary sequences[J]. J R Stat Soc Ser B Stat Methodol, 1958, 20(2): 215-32. doi:10.1111/j.2517-6161.1958.tb00292.x |
| [9] | Omiye JA, Gui H, Rezaei SJ, et al. Large language models in medicine: the potentials and pitfalls: a narrative review[J]. Ann Intern Med, 2024, 177(2): 210-20. doi:10.7326/m23-2772 |
| [10] | Alberts IL, Mercolli L, Pyka T, et al. Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?[J]. Eur J Nucl Med Mol Imag, 2023, 50(6): 1549-52. doi:10.1007/s00259-023-06172-w |
| [11] | Berry P, Dhanakshirur RR, Khanna S. Utilizing large language models for gastroenterology research: a conceptual framework[J]. Therap Adv Gastroenterol, 2025, 18: 17562848251328577. doi:10.1177/17562848251328577 |
| [12] | Guo D, Yang D. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [J]. ArXiv, 2025, abs/2501.12948. |
| [13] | Jebb AT, Ng V, Tay L. A review of key likert scale development advances: 1995-2019[J]. Front Psychol, 2021, 12: 637547. doi:10.3389/fpsyg.2021.637547 |
| [14] | 王淑玉, 杜红阳, 赵晨昊, 等. 基于深度学习和机器学习的胃癌预测模型构建及评估 [J]. 青岛大学学报(医学版), 2025, 61(01): 54-8. |
| [15] | 刘界宇, 黄继华, 李泗云, 等. 机器学习预测急性上消化道出血患者干预及再出血的风险价值 [J]. 广西医科大学学报, 2024, 41(05): 748-55. |
| [16] | Park B, Kim CH, Jun JK, et al. A machine learning risk prediction model for gastric cancer with SHapley additive exPlanations[J]. Cancer Res Treat, 2025, 57(3): 821-9. doi:10.4143/crt.2024.843 |
| [17] | Bode C, Bode JC. Alcohol’s role in gastrointestinal tract disorders[J]. Alcohol Health Res World, 1997, 21(1): 76-83. |
| [18] | Franke A, Singer MV. 59 the effect of beer and its non-alcoholic ingredients on secretory and motoric function of the stomach [M]. San Diego, Academic Press:Beer in Health and Disease Prevention, 2009: 581-6. doi:10.1016/b978-0-12-373891-2.00059-6 |
| [19] | Feick P, Gerloff A, Singer MV. The effect of beer and its non-alcoholic constituents on the exocrine and endocrine pancreas as well as on gastrointestinal hormones[M]//Beer in Health and Disease Prevention. Amsterdam: Elsevier, 2009: 587-601. doi:10.1016/b978-0-12-373891-2.00060-2 |
| [20] | Yang H, Zhang M, Li H, et al. Prevalence of common upper gastrointestinal diseases in Chinese adults aged 18–64 years[J]. Sci Bull, 2024, 69(24): 3889-98. doi:10.1016/j.scib.2024.07.048 |
| [21] | Sapkota AR, Berger S, Vogel TM. Human pathogens abundant in the bacterial metagenome of cigarettes[J]. Environ Health Perspect, 2010, 118(3): 351-6. doi:10.1289/ehp.0901201 |
| [22] | Tan R, Zhao D, Zhang X, et al. Gender and age differences in the global burden of peptic ulcers: an analysis based on GBD data from 1990 to 2021[J]. Front Med: Lausanne, 2025, 12: 1586270. doi:10.3389/fmed.2025.1586270 |
| [23] | Martimianaki G, Alicandro G, Pelucchi C, et al. Tea consumption and gastric cancer: a pooled analysis from the stomach cancer pooling (StoP) project consortium[J]. Br J Cancer, 2022, 127(4): 726-34. doi:10.1038/s41416-022-01856-w |
| [24] | Bond T, Derbyshire E. Tea compounds and the gut microbiome: findings from trials and mechanistic studies[J]. Nutrients, 2019, 11(10): E2364. doi:10.3390/nu11102364 |
| [25] | 俞顺章, 张作风, 俞国培, 等. 饮绿茶对胃癌、慢性胃炎发病影响的流行病学调查 [J]. 中国癌症杂志, 2001, (01): 42-6. |
| [26] | Boyanova L, Ilieva J, Gergova G, et al. Honey and green/black tea consumption may reduce the risk of Helicobacter pylori infection[J]. Diagn Microbiol Infect Dis, 2015, 82(1): 85-6.ancer, 2025, 132(7): 652-9. doi:10.1016/j.diagmicrobio.2015.03.001 |
| [27] | Kang H, Zhou H, Ye Y, et al. Tieguanyin oolong tea extracts alleviate behavioral abnormalities by modulating neuroinflammation in APP/PS1 mouse model of Alzheimer’s disease[J]. Foods, 2021, 11(1): 81. doi:10.3390/foods11010081 |
| [28] | Inoue-Choi M, Ramirez Y, O’Connell C, et al. Hot beverage intake and oesophageal cancer in the UK Biobank: prospective cohort study[J]. British J Cancer, 2025, 132(7): 652-9. doi:10.1038/s41416-025-02953-2 |
| [1] | 崔运能, 冯敏清, 姚亮凤, 严杰文, 李闻瀚, 黄燕平. 基于欠采样的影像组学机器学习模型术前预测子宫肌瘤高强度聚焦超声消融效果[J]. 南方医科大学学报, 2026, 46(1): 141-149. |
| [2] | 程浩然, 严鸿斌, 袁子云, 庄泽鸿, 孙学刚, 姚学清. 大语言模型在肿瘤诊断中的文字报告与医学影像应用研究进展[J]. 南方医科大学学报, 2026, 46(1): 231-238. |
| [3] | 黄启智, 谢戴鹏, 姚霖彤, 李洽轩, 吴少伟, 周海榆. 肿瘤微环境特异性CT影像组学标签预测非小细胞肺癌免疫治疗疗效[J]. 南方医科大学学报, 2025, 45(9): 1903-1918. |
| [4] | 姜君, 封硕, 孙银贵, 安燕. 经尿道前列腺钬激光剜除术后低体温风险预测模型:基于逻辑回归、决策树和支持向量机[J]. 南方医科大学学报, 2025, 45(9): 2019-2025. |
| [5] | 陈梅妹, 王洋, 雷黄伟, 张斐, 黄睿娜, 杨朝阳. 基于多种机器学习算法和语音情绪特征的阈下抑郁辨识模型构建[J]. 南方医科大学学报, 2025, 45(4): 711-717. |
| [6] | 王飞, 李蔚然, 尚祥, 李飞. 中国农村社区老年人认知障碍预测模型的构建与验证——基于中国健康与养老追踪调查数据库[J]. 南方医科大学学报, 2025, 45(12): 2639-2645. |
| [7] | 申采玉, 王帅, 周锐盈, 汪雨贺, 高琴, 陈兴智, 杨枢. 慢性心力衰竭合并肺部感染患者院内死亡风险预测:基于可解释性机器学习方法[J]. 南方医科大学学报, 2024, 44(6): 1141-1148. |
| [8] | 陈莉莉, 吴天宇, 张铭, 丁子夏, 张妍, 杨依清, 郑佳倩, 张小楠. 类风湿关节炎的潜在生物标志物及其免疫调控机制:基于GEO数据库[J]. 南方医科大学学报, 2024, 44(6): 1098-1108. |
| [9] | 左志威, 孟庆良, 崔家康, 郭克磊, 卞华. 基于硬皮病线粒体相关基因的人工神经网络模型的构建[J]. 南方医科大学学报, 2024, 44(5): 920-929. |
| [10] | 聂金蕊, 吴亚慧, 韩雪梅, 李亚琪, 王海宽, 张会图. 副干酪乳杆菌TK1501后生元的制备及对幽门螺旋杆菌的抑制作用[J]. 南方医科大学学报, 2024, 44(5): 867-875. |
| [11] | 戈 悦, 李建伟, 梁宏开, 侯六生, 左六二, 陈 珍, 卢剑海, 赵 新, 梁静漪, 彭 岚, 包静娜, 段佳欣, 刘 俐, 毛可晴, 曾振华, 胡鸿彬, 陈仲清. VA-ECMO患者院内死亡风险预测模型的构建及验证:一项多中心、回顾性、病例对照研究[J]. 南方医科大学学报, 2024, 44(3): 491-498. |
| [12] | 黄晓茵, 陈凤莲, 张煜, 梁淑君. 多参数多区域MRI影像组学特征与临床信息联合模型可有效预测脑胶质瘤患者生存期[J]. 南方医科大学学报, 2024, 44(10): 2004-2014. |
| [13] | 何慧珊, 郭二嘉, 蒙文仪, 王 彧, 王 雯, 何文乐, 吴元魁, 阳 维. 基于磁共振图像机器学习放射组学模型预测脑胶质瘤的强化[J]. 南方医科大学学报, 2024, 44(1): 194-200. |
| [14] | 罗 枭, 程 义, 吴 骋, 贺 佳. 预测重症缺血性脑卒中死亡风险的模型:基于内在可解释性机器学习方法[J]. 南方医科大学学报, 2023, 43(7): 1241-1247. |
| [15] | 高凯绩, 王一豪, 曹海坤, 贾建光. 机器学习模型和Cox回归模型预测食管胃结合部腺癌预后的效能[J]. 南方医科大学学报, 2023, 43(6): 952-963. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||