南方医科大学学报 ›› 2025, Vol. 45 ›› Issue (1): 170-178.doi: 10.12122/j.issn.1673-4254.2025.01.20

• • 上一篇    

针对缺失实验室指标多约束表征学习的卵巢癌鉴别方法

卢梓涵1(), 黄方俊1,3, 蔡光瑶2, 刘继红2, 甄鑫1()   

  1. 1.南方医科大学生物医学工程学院,广东 广州 510515
    2.中山大学肿瘤防治中心妇科//华南肿瘤学国家重点实验室//肿瘤医学省部共建协同创新中心,广东 广州 510145
    3.广东省人民医院,广东 广州 510080
  • 收稿日期:2024-09-05 出版日期:2025-01-20 发布日期:2025-01-20
  • 通讯作者: 甄鑫 E-mail:luzihan203@126.com;xinzhen@smu.edu.cn
  • 作者简介:卢梓涵,在读硕士研究生,E-mail: luzihan203@126.com
  • 基金资助:
    国家自然科学基金(82371908);国家自然科学基金青年基金(62106058);广东省自然科学基金(2022A1515011410)

A multi-constraint representation learning model for identification of ovarian cancer with missing laboratory indicators

Zihan LU1(), Fangjun HUANG1,3, Guangyao CAI2, Jihong LIU2, Xin ZHEN1()   

  1. 1.School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
    2.Department of Gynecology, Sun Yat-sen University Cancer Center, South China State Key Laboratory of Oncology, Provincial-Ministry Collaborative Innovation Center for Medical Oncology, Guangzhou 510145, China
    3.Guangdong Provincial People's Hospital, Guangzhou 510080, China
  • Received:2024-09-05 Online:2025-01-20 Published:2025-01-20
  • Contact: Xin ZHEN E-mail:luzihan203@126.com;xinzhen@smu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(82371908);Natural Science Foundation for the Youth of China(62106058)

摘要:

目的 探索基于多约束表征学习分类模型在面对缺失实验室指标的情况下鉴别卵巢癌的鉴别能力和应用价值。 方法 收集了2344例患者(393例卵巢癌和1951例对照)的缺失实验室指标表格型数据,使用本研究提出的基于判别学习和互信息以及特征投影重要性得分一致性及缺失位置估算的表征学习分类模型对缺失的卵巢癌实验室指标特征进行投影到潜在空间得到分类模型。对提出的约束项进行消融实验,通过准确率、ROC曲线下面积(AUC)、敏感度、特异性说明约束项的可行性和有效项。采用交叉验证方法和准确率、AUC、敏感度、特异性评价该分类模型的鉴别性能。将本研究与其他用于缺失数据的插补方法进行对缺失数据处理后鉴别分类能力的对比。 结果 消融实验结果显示约束项之间有很好的相容性,每项约束项都有较好的鲁棒性。交叉验证结果显示,本研究提出的基于多约束表征学习分类模型在面对缺失实验室指标的情况下对卵巢癌的鉴别中的AUC、准确率、敏感度、特异性分别为0.915、0.888、0.774、0.910,其中AUC和敏感度优于其它缺失数据插补方法。 结论 基于多约束表征学习模型在缺失实验室指标鉴别卵巢癌的应用中具有优秀的鉴别能力和较高的应用价值。与其他缺失插补方法相比,本研究提出的多约束表征学习模型在针对卵巢癌缺失实验室指标的鉴别分类任务中具有较大的优势。

关键词: 缺失数据, 多约束表征学习模型, 判别分析, 特征投影重要性得分一致性, 缺失位置估算, 互信息, 卵巢癌

Abstract:

Objective To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators. Methods Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation. The proposed constraint term was ablated experimentally to assess the feasibility and validity of the constraint term by accuracy, area under the ROC curve (AUC), sensitivity, and specificity. Cross-validation methods and accuracy, AUC, sensitivity and specificity were also used to evaluate the discriminative performance of this classification model in comparison with other interpolation methods for processing of the missing data. Results The results of the ablation experiments showed good compatibility among the constraints, and each constraint had good robustness. The cross-validation experiment showed that for identification of ovarian cancer with missing laboratory indicators, the AUC, accuracy, sensitivity and specificity of the proposed multi-constraints representation-based learning classification model was 0.915, 0.888, 0.774, and 0.910, respectively, and its AUC and sensitivity were superior to those of other interpolation methods. Conclusion The proposed model has excellent discriminatory ability with better performance than other missing data interpolation methods for identification of ovarian cancer with missing laboratory indicators.

Key words: missing data, shared representation learning, discriminant analysis, feature importance score consistency, missing position estimation, mutual information, ovarian cancer