南方医科大学学报 ›› 2026, Vol. 46 ›› Issue (2): 353-361.doi: 10.12122/j.issn.1673-4254.2026.02.13

• • 上一篇    

饮茶对胃肠道疾病风险的双重作用:基于可解释机器学习与大语言模型的联合预测辅助模型

陈君尧1(), 陈泽宇2(), 林钊杰1, 方梦浩1, 沈超英3, 许琦4, 张晓怡5, 卢鲁1()   

  1. 1.天津大学卫生应急学院,天津 300072;安溪县中医院2. 消化内科
    2.安溪县中医院 消化内科 福建 泉州 362400
    3.安溪县中医院 消化内镜中心,福建 泉州 362400
    4.锦州开放大学开放教育学院,辽宁 锦州 121000
    5.河南理工大学建筑与艺术设计学院,河南 焦作 454000
  • 收稿日期:2025-06-26 出版日期:2026-02-20 发布日期:2026-03-10
  • 通讯作者: 卢鲁 E-mail:cjy2300@tju.edu.cn;380893842@qq.com;Lulu_998543@tju.edu.cn
  • 作者简介:陈君尧,在读硕士研究生,E-mail: cjy2300@tju.edu.cn
    陈泽宇,主治医师,E-mail: 380893842@qq.com
    第一联系人:共同第一作者
  • 基金资助:
    深圳市医疗卫生三名工程项目(SZSM202411032)

Dual role of tea consumption in gastrointestinal disease risks: analysis using a risk prediction model integrating interpretable machine learning and large language model

Junyao CHEN1(), Zeyu CHEN2(), Zhaojie LIN1, Menghao FANG1, Chaoying SHEN3, Qi XU4, Xiaoyi ZHANG5, Lu LU1()   

  1. 1.School of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China
    2.Department of Gastroenterology
    3.Center of Gastrointestinal Endoscopy, Anxi Hospital of Traditional Chinese Medicine, Quanzhou 362400, China
    4.Institute of Open Education, Jinzhou Open University, Jinzhou 121000, China
    5.School of Architecture and Art Design, Henan Polytechnic University, Jiaozuo 454000, China
  • Received:2025-06-26 Online:2026-02-20 Published:2026-03-10
  • Contact: Lu LU E-mail:cjy2300@tju.edu.cn;380893842@qq.com;Lulu_998543@tju.edu.cn

摘要:

目的 探究饮茶、饮酒等生活习惯对胃肠疾病的影响与关联关系,并以此建立胃肠疾病早期风险预测和辅助问诊大模型,实现早期胃肠道疾病高风险识别与智能诊疗推荐。 方法 对安溪县中医院消化内镜中心同时进行胃镜检查与13C尿素呼气试验的患者进行调查研究。先对数据进行标量分析,以确定特征选取是否合适。将数据按7∶3随机分为训练集和测试集,应用人工智能和机器学习方法:支持向量机、K近邻、逻辑回归、随机森林极限梯度提升和深度神经网络(DNN),以寻找最佳分类器用于预测胃肠道高风险病症。使用贝叶斯优化算法,获取6种模型的最优超参数组合,进行模型拟合,并应用Shapley加法解释方法对最佳模型进行可解释性分析。选用DeepSeek-R1为基础语言模型,利用胃肠病与中文医疗在线问诊数据集进行参数微调,构建更契合临床实际需求的胃肠疾病问诊大语言模型。 结果 本次调查纳入502人,所选取特征与肠胃疾病均具有一定关联性,但仅有年龄特征与胃肠道疾病呈线性相关性(β=0.023,SE=0.008,t=2.942,P=0.003)。最佳模型为DNN模型,其准确率为0.68、精确性为0.68、召回率为0.85、F1值为0.75、AUC为0.74。基于DNN模型进行特征重要性排序,前3名为年龄、DOB值和烟龄;构建大语言模型与实际专业医师根据胃镜检查结果所提供的建议高度一致。 结论 基于DNN机器学习方法构建的胃肠疾病风险预测模型最佳,可为临床胃肠道疾病进行风险预测并对是否进行胃肠镜检查提供可靠依据,同时表明预防胃肠道疾病发生需禁烟少酒、合理饮茶。构建的肠胃问诊大语言模型可为患者提供更为专业的医学指导,具有较强的临床实用价值。

关键词: 胃肠道疾病, 幽门螺旋杆菌, 机器学习, 风险预测, 大语言模型

Abstract:

Objective To explore the correlation of tea consumption with risks of gastrointestinal diseases using a risk prediction model integrating interpretable machine learning and a large language model. Methods A survey was conducted among the patients undergoing both gastroscopy and 13C-urea breath testing at Gastrointestinal Endoscopy Center of Anxi Hospital of Traditional Chinese Medicine. Univariate analysis was performed to determine the suitability of feature selection. The collected data were randomly divided into training and testing sets in a 7:3 ratio. Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGB), and Deep Neural Network (DNN) were applied to identify the best classifier for predicting high-risk gastrointestinal conditions. Bayesian optimization algorithm was used to obtain the optimal hyperparameter combinations for the 6 models. After Model fitting, the interpretability of the best models was analyzed using SHapley Additive exPlanations (SHAP). The DeepSeek-R1 base language model was fine-tuned with gastrointestinal disease dataset and Chinese medical online consultation data to obtain the final model. Results The study included 503 participants. All the selected features showed association with gastrointestinal diseases, but only age exhibited a significant linear correlation (β=0.023, SE=0.008, t=2.942, P=0.003). DNN model performed the best with a good accuracy (0.68), precision (0.68), recall rate (0.85), F1 Score (0.75), and AUC (0.74). The top 3 important features were age, DOB value, and smoking history. The large language model constructed provided recommendations consistent with those of professional physicians based on gastroscopy results. Conclusion DNN model is effective for predicting gastrointestinal disease risk and offers reliable support for clinical risk assessment and decision-making regarding endoscopy. Smoking cessation, moderate alcohol consumption, and reasonable tea intake may help prevent gastrointestinal diseases.

Key words: gastrointestinal diseases, helicobacter pylori, machine learning, risk prediction, large language model