Journal of Southern Medical University ›› 2025, Vol. 45 ›› Issue (1): 170-178.doi: 10.12122/j.issn.1673-4254.2025.01.20

Previous Articles    

A multi-constraint representation learning model for identification of ovarian cancer with missing laboratory indicators

Zihan LU1(), Fangjun HUANG1,3, Guangyao CAI2, Jihong LIU2, Xin ZHEN1()   

  1. 1.School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
    2.Department of Gynecology, Sun Yat-sen University Cancer Center, South China State Key Laboratory of Oncology, Provincial-Ministry Collaborative Innovation Center for Medical Oncology, Guangzhou 510145, China
    3.Guangdong Provincial People's Hospital, Guangzhou 510080, China
  • Received:2024-09-05 Online:2025-01-20 Published:2025-01-20
  • Contact: Xin ZHEN E-mail:luzihan203@126.com;xinzhen@smu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(82371908);Natural Science Foundation for the Youth of China(62106058)

Abstract:

Objective To evaluate the performance of a multi-constraint representation learning classification model for identifying ovarian cancer with missing laboratory indicators. Methods Tabular data with missing laboratory indicators were collected from 393 patients with ovarian cancer and 1951 control patients. The missing ovarian cancer laboratory indicator features were projected to the latent space to obtain a classification model using the representational learning classification model based on discriminative learning and mutual information coupled with feature projection significance score consistency and missing location estimation. The proposed constraint term was ablated experimentally to assess the feasibility and validity of the constraint term by accuracy, area under the ROC curve (AUC), sensitivity, and specificity. Cross-validation methods and accuracy, AUC, sensitivity and specificity were also used to evaluate the discriminative performance of this classification model in comparison with other interpolation methods for processing of the missing data. Results The results of the ablation experiments showed good compatibility among the constraints, and each constraint had good robustness. The cross-validation experiment showed that for identification of ovarian cancer with missing laboratory indicators, the AUC, accuracy, sensitivity and specificity of the proposed multi-constraints representation-based learning classification model was 0.915, 0.888, 0.774, and 0.910, respectively, and its AUC and sensitivity were superior to those of other interpolation methods. Conclusion The proposed model has excellent discriminatory ability with better performance than other missing data interpolation methods for identification of ovarian cancer with missing laboratory indicators.

Key words: missing data, shared representation learning, discriminant analysis, feature importance score consistency, missing position estimation, mutual information, ovarian cancer