Journal of Southern Medical University ›› 2025, Vol. 45 ›› Issue (1): 170-178.doi: 10.12122/j.issn.1673-4254.2025.01.20
Previous Articles Next Articles
					
													Zihan LU1( ), Fangjun HUANG1,3, Guangyao CAI2, Jihong LIU2, Xin ZHEN1(
), Fangjun HUANG1,3, Guangyao CAI2, Jihong LIU2, Xin ZHEN1( )
)
												  
						
						
						
					
				
Received:2024-09-05
															
							
															
							
															
							
																	Online:2025-01-20
															
							
																	Published:2025-01-20
															
						Contact:
								Xin ZHEN   
																	E-mail:luzihan203@126.com;xinzhen@smu.edu.cn
																					Supported by:Zihan LU, Fangjun HUANG, Guangyao CAI, Jihong LIU, Xin ZHEN. A multi-constraint representation learning model for identification of ovarian cancer with missing laboratory indicators[J]. Journal of Southern Medical University, 2025, 45(1): 170-178.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.j-smu.com/EN/10.12122/j.issn.1673-4254.2025.01.20
| Algorithm 1 Pseudocode of the proposed method | 
|---|
| Training stage Input: data feature matrix  Output: projection Matrix for Data Representation Learning  | 
| Begin Initialize For t to number of iterations For Calculate Calculate Updated End End End | 
| Testing stage Input: New test dataset for missing data Output: The projection of the new dataset onto the potential space obtain | 
Tab.1 Algorithmic pseudo-code for a representation learning model applied to the missing data
| Algorithm 1 Pseudocode of the proposed method | 
|---|
| Training stage Input: data feature matrix  Output: projection Matrix for Data Representation Learning  | 
| Begin Initialize For t to number of iterations For Calculate Calculate Updated End End End | 
| Testing stage Input: New test dataset for missing data Output: The projection of the new dataset onto the potential space obtain | 
| Model | AUC | Accuracy | Sensitivity | Specificity | 
|---|---|---|---|---|
| BF | 0.862 | 0.856 | 0.627 | 0.903 | 
| BF+CRF | 0.893 | 0.864 | 0.641 | 0.911 | 
| BF+DA | 0.897 | 0.880 | 0.761 | 0.905 | 
| BF+MPE | 0.877 | 0.862 | 0.644 | 0.905 | 
| BF+MIF | 0.871 | 0.870 | 0.655 | 0.915 | 
| PROPOSED | 0.919 | 0.899 | 0.729 | 0.940 | 
Tab.2 Results of ablation experiments of the representation learning model on ovarian cancer laboratory index data
| Model | AUC | Accuracy | Sensitivity | Specificity | 
|---|---|---|---|---|
| BF | 0.862 | 0.856 | 0.627 | 0.903 | 
| BF+CRF | 0.893 | 0.864 | 0.641 | 0.911 | 
| BF+DA | 0.897 | 0.880 | 0.761 | 0.905 | 
| BF+MPE | 0.877 | 0.862 | 0.644 | 0.905 | 
| BF+MIF | 0.871 | 0.870 | 0.655 | 0.915 | 
| PROPOSED | 0.919 | 0.899 | 0.729 | 0.940 | 
| Methods | AUC/Accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GAIN | MICE | KNN | MEAN | SINKHORN | ROUND-ROBIN | SOFT | MISSFOREST | REMASKER | PROPOSED | |
| Covid_1 | 0.884/0.811 | 0.863/0.805 | 0.877/0.820 | 0.856/0.794 | 0.877/0.823 | 0.882/0.829 | 0.889/0.826 | 0.890/0.829 | 0.885/0.829 | 0.903/0.832 | 
| Thyroid_1 | 0.970/0.933 | 0.977/0.942 | 0.978/0.953 | 0.969/0.948 | 0.978/0.948 | 0.968/0.935 | 0.970/0.932 | 0.979/0.958 | 0.975/0.950 | 0.981/0.953 | 
| Cirrhosis_1 | 0.861/0.761 | 0.831/0.773 | 0.848/0.785 | 0.808/0.678 | 0.823/0.761 | 0.870/0.773 | 0.870/0.738 | 0.848/0.738 | 0.813/0.761 | 0.872/0.785 | 
| Covid_2 | 0.827/0.589 | 0.816/0.732 | 0.787/0.660 | 0.760/0.696 | 0.819/0.732 | 0.830/0.714 | 0.805/0.642 | 0.831/0.714 | 0.750/0.625 | 0.885/0.732 | 
| Thyroid_2 | 0.959/0.982 | 0.957/0.983 | 0.929/0.978 | 0.931/0.976 | 0.964/0.971 | 0.956/0.983 | 0.942/0.964 | 0.964/0.975 | 0.938/0.973 | 0.971/0.962 | 
| HCC | 0.882/0.757 | 0.838/0.757 | 0.829/0.727 | 0.779/0.727 | 0.888/0.787 | 0.833/0.818 | 0.840/0.787 | 0.865/0.757 | 0.796/0.696 | 0.917/0.848 | 
| Hepatitis | 0.928/0.838 | 0.910/0.838 | 0.807/0.838 | 0.779/0.838 | 0.840/0.870 | 0.823/0.870 | 0.773/0.741 | 0.833/0.80 | 0.757/0.838 | 0.964/0.903 | 
| DRD | 0.800/0.718 | 0.800/0.731 | 0.800/0.709 | 0.758/0.683 | 0.787/0.701 | 0.800/0.709 | 0.798/0.718 | 0.795/0.709 | 0.797/0.718 | 0.824/0.731 | 
| MI | 0.763/0.761 | 0.766/0.747 | 0.764/0.723 | 0.732/0.708 | 0.766/0.700 | 0.768/0.702 | 0.712/0.676 | 0.751/0.732 | 0.749/0.726 | 0.770/0.694 | 
| Cirrhosis_2 | 0.840/0.761 | 0.822/0.738 | 0.889/0.761 | 0.814/0.678 | 0.872/0.773 | 0.878/0.797 | 0.838/0.773 | 0.849/0.773 | 0.813/0.761 | 0.896/0.809 | 
| PBC | 0.851/0.773 | 0.935/0.797 | 0.890/0.738 | 0.826/0.761 | 0.888/0.773 | 0.879/0.761 | 0.816/0.738 | 0.849/0.797 | 0.844/0.797 | 0.902/0.833 | 
| Support | 0.886/0.805 | 0.887/0.795 | 0.815/0.750 | 0.810/0.725 | 0.859/0.838 | 0.833/0.820 | 0.865/0.836 | 0.870/0.839 | 0.863/0.845 | 0.919/0.845 | 
| Thyroid_3 | 0.916/0.917 | 0.915/0.944 | 0.917/0.942 | 0.883/0.919 | 0.916/0.933 | 0.909/0.941 | 0.901/0.935 | 0.919/0.926 | 0.928/0.923 | 0.945/0.892 | 
| Kidney | 0.994/0.937 | 0.997/0.950 | 0.993/0.950 | 0.995/0.962 | 0.991/0.959 | 0.996/0.952 | 0.998/0.962 | 0.995/0.962 | 0.994/0.937 | 0.999/0.962 | 
| Thyroid_4 | 0.974/0.978 | 0.992/0.971 | 0.990/0.978 | 0.976/0.978 | 0.992/0.973 | 0.987/0.980 | 0.989/0.976 | 0.990/0.971 | 0.991/0.976 | 0.995/0.991 | 
Tab.3 Comparison of the classification discrimination ability of our model and other interpolation methods on 15 different missing datasets
| Methods | AUC/Accuracy | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GAIN | MICE | KNN | MEAN | SINKHORN | ROUND-ROBIN | SOFT | MISSFOREST | REMASKER | PROPOSED | |
| Covid_1 | 0.884/0.811 | 0.863/0.805 | 0.877/0.820 | 0.856/0.794 | 0.877/0.823 | 0.882/0.829 | 0.889/0.826 | 0.890/0.829 | 0.885/0.829 | 0.903/0.832 | 
| Thyroid_1 | 0.970/0.933 | 0.977/0.942 | 0.978/0.953 | 0.969/0.948 | 0.978/0.948 | 0.968/0.935 | 0.970/0.932 | 0.979/0.958 | 0.975/0.950 | 0.981/0.953 | 
| Cirrhosis_1 | 0.861/0.761 | 0.831/0.773 | 0.848/0.785 | 0.808/0.678 | 0.823/0.761 | 0.870/0.773 | 0.870/0.738 | 0.848/0.738 | 0.813/0.761 | 0.872/0.785 | 
| Covid_2 | 0.827/0.589 | 0.816/0.732 | 0.787/0.660 | 0.760/0.696 | 0.819/0.732 | 0.830/0.714 | 0.805/0.642 | 0.831/0.714 | 0.750/0.625 | 0.885/0.732 | 
| Thyroid_2 | 0.959/0.982 | 0.957/0.983 | 0.929/0.978 | 0.931/0.976 | 0.964/0.971 | 0.956/0.983 | 0.942/0.964 | 0.964/0.975 | 0.938/0.973 | 0.971/0.962 | 
| HCC | 0.882/0.757 | 0.838/0.757 | 0.829/0.727 | 0.779/0.727 | 0.888/0.787 | 0.833/0.818 | 0.840/0.787 | 0.865/0.757 | 0.796/0.696 | 0.917/0.848 | 
| Hepatitis | 0.928/0.838 | 0.910/0.838 | 0.807/0.838 | 0.779/0.838 | 0.840/0.870 | 0.823/0.870 | 0.773/0.741 | 0.833/0.80 | 0.757/0.838 | 0.964/0.903 | 
| DRD | 0.800/0.718 | 0.800/0.731 | 0.800/0.709 | 0.758/0.683 | 0.787/0.701 | 0.800/0.709 | 0.798/0.718 | 0.795/0.709 | 0.797/0.718 | 0.824/0.731 | 
| MI | 0.763/0.761 | 0.766/0.747 | 0.764/0.723 | 0.732/0.708 | 0.766/0.700 | 0.768/0.702 | 0.712/0.676 | 0.751/0.732 | 0.749/0.726 | 0.770/0.694 | 
| Cirrhosis_2 | 0.840/0.761 | 0.822/0.738 | 0.889/0.761 | 0.814/0.678 | 0.872/0.773 | 0.878/0.797 | 0.838/0.773 | 0.849/0.773 | 0.813/0.761 | 0.896/0.809 | 
| PBC | 0.851/0.773 | 0.935/0.797 | 0.890/0.738 | 0.826/0.761 | 0.888/0.773 | 0.879/0.761 | 0.816/0.738 | 0.849/0.797 | 0.844/0.797 | 0.902/0.833 | 
| Support | 0.886/0.805 | 0.887/0.795 | 0.815/0.750 | 0.810/0.725 | 0.859/0.838 | 0.833/0.820 | 0.865/0.836 | 0.870/0.839 | 0.863/0.845 | 0.919/0.845 | 
| Thyroid_3 | 0.916/0.917 | 0.915/0.944 | 0.917/0.942 | 0.883/0.919 | 0.916/0.933 | 0.909/0.941 | 0.901/0.935 | 0.919/0.926 | 0.928/0.923 | 0.945/0.892 | 
| Kidney | 0.994/0.937 | 0.997/0.950 | 0.993/0.950 | 0.995/0.962 | 0.991/0.959 | 0.996/0.952 | 0.998/0.962 | 0.995/0.962 | 0.994/0.937 | 0.999/0.962 | 
| Thyroid_4 | 0.974/0.978 | 0.992/0.971 | 0.990/0.978 | 0.976/0.978 | 0.992/0.973 | 0.987/0.980 | 0.989/0.976 | 0.990/0.971 | 0.991/0.976 | 0.995/0.991 | 
| Strategy | AUC | Accuracy | Sensitivity | Specificity | 
|---|---|---|---|---|
| MEAN | 0.891 | 0.901 | 0.620 | 0.956 | 
| KNN | 0.880 | 0.892 | 0.611 | 0.946 | 
| MICE | 0.888 | 0.892 | 0.505 | 0.965 | 
| MISSFOREST | 0.901 | 0.896 | 0.559 | 0.961 | 
| AE | 0.871 | 0.904 | 0.598 | 0.964 | 
| REMASKER | 0.883 | 0.891 | 0.588 | 0.949 | 
| SINKHORN | 0.893 | 0.906 | 0.520 | 0.978 | 
| ROUND-ROBIN | 0.896 | 0.906 | 0.548 | 0.973 | 
| GAIN | 0.900 | 0.898 | 0.568 | 0.961 | 
| SOFT | 0.898 | 0.909 | 0.622 | 0.963 | 
| PROPOSED | 0.919 | 0.899 | 0.729 | 0.940 | 
Tab.4 Comparison of the discriminative classification perfo-rmance of the proposed multi-constrained representation learning model and other data interpolation methods on ovarian cancer data with missing laboratory indicators
| Strategy | AUC | Accuracy | Sensitivity | Specificity | 
|---|---|---|---|---|
| MEAN | 0.891 | 0.901 | 0.620 | 0.956 | 
| KNN | 0.880 | 0.892 | 0.611 | 0.946 | 
| MICE | 0.888 | 0.892 | 0.505 | 0.965 | 
| MISSFOREST | 0.901 | 0.896 | 0.559 | 0.961 | 
| AE | 0.871 | 0.904 | 0.598 | 0.964 | 
| REMASKER | 0.883 | 0.891 | 0.588 | 0.949 | 
| SINKHORN | 0.893 | 0.906 | 0.520 | 0.978 | 
| ROUND-ROBIN | 0.896 | 0.906 | 0.548 | 0.973 | 
| GAIN | 0.900 | 0.898 | 0.568 | 0.961 | 
| SOFT | 0.898 | 0.909 | 0.622 | 0.963 | 
| PROPOSED | 0.919 | 0.899 | 0.729 | 0.940 | 
| 1 | Zheng RS, Zhang SW, Zeng HM, et al. Cancer incidence and mortality in China, 2016[J]. J Natl Cancer Cent, 2022, 2(1): 1-9. | 
| 2 | National Cancer Institute. Cancer stat facts: ovarian cancer 2024[EB/OL]. [2020-08-10]. . | 
| 3 | Zeng HM, Zheng RS, Guo YM, et al. Cancer survival in China, 2003-2005: a population-based study[J]. Int J Cancer, 2015, 136(8): 1921-30. | 
| 4 | Sundar S, Neal RD, Kehoe S. Diagnosis of ovarian cancer[J]. BMJ, 2015, 351: h4443. | 
| 5 | Dochez V, Caillon H, Vaucel E, et al. Biomarkers and algorithms for diagnosis of ovarian cancer: CA125, HE4, RMI and ROMA, a review[J]. J Ovarian Res, 2019, 12(1): 28. | 
| 6 | Li JP, Dowdy S, Tipton T, et al. HE4 as a biomarker for ovarian and endometrial cancer management[J]. Expert Rev Mol Diagn, 2009, 9(6): 555-66. | 
| 7 | Guo YY, Jiang TJ, Ouyang LL, et al. A novel diagnostic nomogram based on serological and ultrasound findings for preoperative prediction of malignancy in patients with ovarian masses[J]. Gynecol Oncol, 2021, 160(3): 704-12. | 
| 8 | Nijman S, Leeuwenberg AM, Beekers I, et al. Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review[J]. J Clin Epidemiol, 2022, 142: 218-29. | 
| 9 | Papageorgiou G, Grant SW, Takkenberg JJM, et al. Statistical primer: how to deal with missing data in scientific research?[J]. Interact Cardiovasc Thorac Surg, 2018, 27(2): 153-8. | 
| 10 | Hastie T, Mazumder R, Lee JD, et al. Matrix completion and low-rank SVD via fast alternating least squares[J]. J Mach Learn Res, 2015, 16: 3367-402. | 
| 11 | van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate im-putation by chained equations in R[J]. J Stat Soft, 2011, 45(3): 1-67. | 
| 12 | Qu L, Li L, Zhang Y, et al. PPCA-based missing data imputation for traffic flow volume: a systematical approach[J]. IEEE Trans Intell Transp Syst, 2009, 10(3): 512-22. | 
| 13 | Crookston NL, Finley AO. yaImpute: An Rpackage for KNN imputation[J]. J Stat Soft, 2008, 23(10): 1-16. | 
| 14 | Stekhoven DJ, Bühlmann P. MissForest: non-parametric missing value imputation for mixed-type data[J]. Bioinformatics, 2012, 28(1): 112-8. | 
| 15 | Zhang XM, Yan C, Gao C, et al. Predicting missing values in medical data via XGBoost regression[J]. J Healthc Inform Res, 2020, 4(4): 383-94. | 
| 16 | Yoon J, Jordon J, Schaar M. GAIN: missing data imputation using generative adversarial nets[EB/OL]. [2018-06-07]. | 
| 17 | Du TY, Melis L, Wang T. ReMasker: imputing tabular data with masked autoencoding[EB/OL]. [2023-09-25]. | 
| 18 | Muzellec B, Josse J, Boyer C, et al. Missing data imputation using optimal transport[EB/OL]. [2020-07-01]. . | 
| 19 | Ning ZY, Lin ZH, Xiao Q, et al. Multi-constraint latent representation learning for prognosis analysis using multi-modal data[J]. IEEE Trans Neural Netw Learn Syst, 2023, 34(7): 3737-50. | 
| 20 | Ning ZY, Du DH, Tu C, et al. Relation-aware shared representation learning for cancer prognosis analysis with auxiliary clinical variables and incomplete multi-modality data[J]. IEEE Trans Med Imaging, 2022, 41(1): 186-98. | 
| 21 | Ning ZY, Xiao Q, Feng QJ, et al. Relation-induced multi-modal shared representation learning for Alzheimer’s disease diagnosis[J]. IEEE Trans Med Imaging, 2021, 40(6): 1632-45. | 
| 22 | Liu Y, Hong XP, Tao XY, et al. Model behavior preserving for class-incremental learning[J]. IEEE Trans Neural Netw Learn Syst, 2023, 34(10): 7529-40. | 
| 23 | Yoon JS, Zhang Y, Jordan J, et al. VIME: extending the success of self- and semi-supervised learning to tabular domain[C]//Advances in Neural Information Processing Systems 33, 2020. | 
| 24 | Gülmezoglu MB, Edizkan R, Ergin S, et al. Use of center of gravity with the common vector approach in isolated word recognition[J]. Expert Syst Appl, 2018, 38(4): 3690-6. | 
| 25 | Lerman PM. Fitting segmented regression models by grid search[J]. Appl Stat, 1980, 29(1): 77. | 
| 26 | Antal B, Hajdu A. An ensemble-based system for automatic screening of diabetic retinopathy[J]. Knowl Based Syst, 2014, 60: 20-7. | 
| 27 | Cabitza F, Campagner A, Ferrari D, et al. Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests[J]. Clin Chem Lab Med, 2020, 59(2): 421-31. | 
| 28 | Dickson ER, Grambsch PM, Fleming TR, et al. Prognosis in primary biliary cirrhosis: model for decision making[J]. Hepatology, 1989, 10(1): 1-7. | 
| 29 | Golovenkin SE, Bac J, Chervov A, et al. Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data[J]. Gigascience, 2020, 9(11): giaa128. | 
| 30 | García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR. Pattern classification with missing data: a review[J]. Neural Comput Appl, 2010, 19(2): 263-82. | 
| 31 | Awan SE, Bennamoun M, Sohel F, et al. A reinforcement learning-based approach for imputing missing data[J]. Neural Comput Appl, 2022, 34(12): 9701-16. | 
| 32 | Lin WC, Tsai CF. Missing value imputation: a review and analysis of the literature (2006-2017)[J]. Artif Intell Rev, 2020, 53(2): 1487-509. | 
| 33 | Ramos-Pérez I, Barbero-Aparicio JA, Canepa-Oneto A, et al. An extensive performance comparison between feature reduction and feature selection preprocessing algorithms on imbalanced wide data[J]. Information, 2024, 15(4): 223. | 
| 34 | Nasir IM, Khan MA, Yasmin M, et al. Pearson correlation-based feature selection for document classification using balanced training[J]. Sensors, 2020, 20(23): 6793. | 
| 35 | Berisha V, Krantsevich C, Hahn PR, et al. Digital medicine and the curse of dimensionality[J]. NPJ Digit Med, 2021, 4(1): 153. | 
| 36 | Pingi ST, Zhang DY, Bashar MA, et al. Joint representation learning with generative adversarial imputation network for improved classification of longitudinal data[J]. Data Sci Eng, 2024, 9(1): 5-25. | 
| 37 | Du WJ, Côté D, Liu Y. SAITS: self-attention-based imputation for time series[J]. Expert Syst Appl, 2023, 219: 119619. | 
| 38 | Zhang P, Gao WF, Hu JC, et al. Multi-label feature selection based on high-order label correlation assumption[J]. Entropy, 2020, 22(7): 797. | 
| 39 | Fan QC, Liu SC, Zhao CJ, et al. An instance- and label-based feature selection method in classification tasks[J]. Information, 2023, 14(10): 532. | 
| 40 | He Q, Li X, Nathan Kim DW, et al. Feasibility study of a multi-criteria decision-making based hierarchical model for multi-modality feature and multi-classifier fusion: applications in medical prognosis prediction[J]. Inf Fusion, 2020, 55: 207-19. | 
| 41 | Tayarani-Najaran MH. A novel ensemble machine learning and an evolutionary algorithm in modeling the COVID-19 epidemic and optimizing government policies[J]. IEEE Trans Syst Man Cybern Syst, 2022, 52(10): 6362-72. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||