Incomplete multimodal bone tumor image classification based on feature decoupling and fusion

doi:10.12122/j.issn.1673-4254.2025.06.22

Abstract

Abstract:

Objective To construct a bone tumor classification model based on feature decoupling and fusion for processing modality loss and fusing multimodal information to improve classification accuracy. Methods A decoupling completion module was designed to extract local and global bone tumor image features from available modalities. These features were then decomposed into shared and modality-specific features, which were used to complete the missing modality features, thereby reducing completion bias caused by modality differences. To address the challenge of modality differences that hinder multimodal information fusion, a cross-attention-based fusion module was introduced to enhance the model's ability to learn cross-modal information and fully integrate specific features, thereby improving the accuracy of bone tumor classification. Results The experiment was conducted using a bone tumor dataset collected from the Third Affiliated Hospital of Southern Medical University for training and testing. Among the 7 available modality combinations, the proposed method achieved an average AUC, accuracy, and specificity of 0.766, 0.621, and 0.793, respectively, which represent improvements of 2.6%, 3.5%, and 1.7% over existing methods for handling missing modalities. The best performance was observed when all the modalities were available, resulting in an AUC of 0.837, which still reached 0.826 even with MRI alone. Conclusion The proposed method can effectively handle missing modalities and successfully integrate multimodal information, and show robust performance in bone tumor classification under various complex missing modality scenarios.

Key words: bone tumor classification, multimodal imaging, modality missing, feature decoupling, attention fusion

Qinghai ZENG, Chuanpu LI, Wei YANG, Liwen SONG, Yinghua ZHAO, Yi YANG. Incomplete multimodal bone tumor image classification based on feature decoupling and fusion[J]. Journal of Southern Medical University, 2025, 45(6): 1327-1335.

Figures/Tables 11

Fig.1 Incomplete multimodal bone tumor images. A: Benign. B: Intermediate. C: Malignant.

Tab.1 Image modal distribution

CT (658)	MRI (547)	X (850)	Total (1043)
√	√	√	314
√	√		106
√		√	58
	√	√	180
√			29
	√		146
		√	210

Fig.2 Overall framework of the proposed method.

Fig.3 Cross modal attention fusion.

Tab.2 Comparative experiment results of incomplete multimodal learning methods

Modalities				Methods
CT	MRI	X		Late-fusion	SIML	mmformer	Multi-task	ACNet	Ours
			AUC	0.682	0.737	0.785	0.795	0.811	0.837
√	√	√	Acc	0.524	0.597	0.625	0.661	0.637	0.721
			Spe	0.722	0.787	0.806	0.815	0.809	0.847
			AUC	0.675	0.750	0.774	0.795	0.800	0.816
√	√		Acc	0.540	0.621	0.606	0.694	0.629	0.683
			Spe	0.737	0.795	0.793	0.826	0.800	0.833
			AUC	0.592	0.641	0.648	0.684	0.701	0.745
√		√	Acc	0.435	0.508	0.481	0.538	0.379	0.556
			Spe	0.670	0.722	0.702	0.745	0.689	0.758
			AUC	0.683	0.738	0.824	0.805	0.780	0.831
	√	√	Acc	0.540	0.605	0.606	0.661	0.444	0.663
			Spe	0.750	0.796	0.810	0.823	0.719	0.825
			AUC	0.593	0.642	0.637	0.702	0.700	0.686
√			Acc	0.444	0.540	0.500	0.548	0.379	0.577
			Spe	0.679	0.740	0.710	0.749	0.687	0.764
			AUC	0.667	0.743	0.806	0.775	0.760	0.826
	√		Acc	0.524	0.621	0.587	0.645	0.452	0.702
			Spe	0.750	0.802	0.804	0.809	0.722	0.840
			AUC	0.555	0.499	0.587	0.549	0.629	0.624
		√	Acc	0.444	0.411	0.394	0.355	0.355	0.442
			Spe	0.668	0.670	0.665	0.667	0.667	0.681
Average			AUC	0.635	0.678	0.723	0.729	0.740	0.766
			Acc	0.493	0.558	0.543	0.586	0.468	0.621
			Spe	0.711	0.759	0.756	0.776	0.727	0.793

Tab.3 Comparison of classification performance of the models on internal and external test sets

AUC			Internal test set		External test set
CT	MRI	X	ACNet	Ours	ACNet	Ours
√	√	√	0.811	0.837	0.797	0.824
√	√		0.800	0.816	0.763	0.820
√		√	0.701	0.745	0.720	0.765
	√	√	0.780	0.831	0.731	0.737
√			0.700	0.686	0.689	0.740
	√		0.760	0.826	0.677	0.684
		√	0.629	0.624	0.623	0.662
Average			0.740	0.766	0.714	0.748

Fig.4 Example of model judging tumor category.

Fig.5 ROC curve under full modal input. A: ROC curves of different models. B: ROC curves of different types of bone tumors in our model.

Tab.4 Classification performance of models for classifying bone tumors in different locations

Location	AUC	Acc	Spe
Torso or head	0.677	0.583	0.810
Extremities	0.871	0.739	0.854

Tab.5 Results of ablation study

Modalities			AUC
CT	MRI	X	w/o DCM	w/o CMA	Overall Model
√	√	√	0.777	0.805	0.837
√	√		0.777	0.806	0.816
√		√	0.670	0.706	0.745
	√	√	0.757	0.818	0.831
√			0.677	0.693	0.686
	√		0.748	0.820	0.826
		√	0.604	0.574	0.624
Average			0.716	0.746	0.766

Tab.6 Performance of classification models under different hyperparameters

Hyper-parameters	AUC	Acc	Spe
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 0.2	0.823	0.663	0.824
$λ 1$ = 0.2、 $λ 2$ = 0.3、 $λ 3$ = 0.1; $α$ = 0.2; $β$ = 0.2	0.812	0.711	0.841
$λ 1$ = $λ 2$ = $λ 3$ = 1; $α$ = 0.2; $β$ = 0.2	0.824	0.654	0.826
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 1; $β$ = 0.2	0.808	0.625	0.842
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 1	0.830	0.682	0.834
Automatic weighted^[28]	0.837	0.721	0.847

Tab.6 Performance of classification models under different hyperparameters

Hyper-parameters	AUC	Acc	Spe
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 0.2	0.823	0.663	0.824
$λ 1$ = 0.2、 $λ 2$ = 0.3、 $λ 3$ = 0.1; $α$ = 0.2; $β$ = 0.2	0.812	0.711	0.841
$λ 1$ = $λ 2$ = $λ 3$ = 1; $α$ = 0.2; $β$ = 0.2	0.824	0.654	0.826
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 1; $β$ = 0.2	0.808	0.625	0.842
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 1	0.830	0.682	0.834
Automatic weighted^[28]	0.837	0.721	0.847

References 31

1	Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020[J]. CA A Cancer J Clin, 2020, 70(1): 7-30. doi：10.3322/caac.21590
2	姜兆侯, 司建荣, 张雅丽. 骨肿瘤的比较影像学思考[J]. 临床放射学杂志, 2002, 21(2): 89-92. doi：10.3969/j.issn.1001-9324.2002.02.001
3	李少利, 董颖, 袁瑛. 2023年第2版《NCCN恶性骨肿瘤临床实践指南》更新解读[J]. 实用肿瘤杂志, 2023, 38(1): 1-4. doi：10.13267/j.cnki.syzlzz.2023.001
4	Choi JH, Ro JY. The 2020 WHO classification of tumors of soft tissue: selected changes and new entities[J]. Adv Anat Pathol, 2021, 28(1): 44-58. doi：10.1097/PAP.0000000000000284
5	Winn RJ, McClure J. The NCCN clinical practice guidelines in oncology: a primer for users[J]. J Natl Compr Canc Netw, 2003, 1(1): 5-13. doi：10.6004/jnccn.2003.0003
6	Do BH, Langlotz C, Beaulieu CF. Bone tumor diagnosis using a Naïve Bayesian model of demographic and radiographic features[J]. J Digit Imaging, 2017, 30(5): 640-7. doi：10.1007/s10278-017-0001-7
7	Board W. Soft tissue and bone tumours[J]. Inte Agen Res Cancer: Lyon, France, 2020: 472-4.
8	魏清柱. 第五版 WHO骨肿瘤分类[J]. 诊断病理学杂志, 2021, 28(6): 497-8.
9	Gianferante DM, Mirabello L, Savage SA. Germline and somatic genetics of osteosarcoma: connecting aetiology, biology and therapy[J]. Nat Rev Endocrinol, 2017, 13(8): 480-91. doi：10.1038/nrendo.2017.16
10	Fritzsche H, Schaser KD, Hofbauer C. Benign tumours and tumour-like lesions of the bone: general treatment principles[J]. Orthopade, 2017, 46(6): 484-97. doi：10.1007/s00132-017-3429-z
11	Gutowski CJ, Basu-Mallick A, Abraham JA. Management of bone sarcoma[J]. Surg Clin North Am, 2016, 96(5): 1077-106. doi：10.1016/j.suc.2016.06.002
12	Gaume M, Chevret S, Campagna R, et al. The appropriate and sequential value of standard radiograph, computed tomography and magnetic resonance imaging to characterize a bone tumor[J]. Sci Rep, 2022, 12(1): 6196. doi：10.1038/s41598-022-10218-8
13	Dorent R, Joutard S, Modat M, et al. Hetero-modal variational encoder-decoder for joint modality completion and segmentation[M]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2019. Cham: Springer International Publishing, 2019: 74-82. doi：10.1007/978-3-030-32245-8_9
14	Ding YH, Yu X, Yang Y. RFNet: region-aware fusion network for incomplete multi-modal brain tumor segmentation[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 3955-64. doi：10.1109/iccv48922.2021.00394
15	Zhang Y, He NJ, Yang JW, et al. mmFormer: multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation[M]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2022. Cham: Springer Nature Switzerland, 2022: 107-17. doi：10.1007/978-3-031-16443-9_11
16	Guo WK, Huang HB, Kong XW, et al. Learning disentangled representation for cross-modal retrieval with deep mutual information estimation[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice France. ACM, 2019: 1712-1720. doi：10.1145/3343031.3351053
17	Lu Y, Wu Y, Liu B, et al. Cross-modality person re-identification with shared-specific feature transfer[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020. Seattle, WA, USA. IEEE, 2020: 13376-86. doi：10.1109/cvpr42600.2020.01339
18	Yang HR, Sun J, Xu ZB. Learning unified hyper-network for multi-modal MR image synthesis and tumor segmentation with missing modalities[J]. IEEE Trans Med Imaging, 2023, 42(12): 3678-89. doi：10.1109/tmi.2023.3301934
19	Tseng KL, Lin YL, Hsu W, et al. Joint sequence learning and cross-modality convolution for 3D biomedical segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 3739-46. doi：10.1109/CVPR.2017.398
20	Zhou CH, Ding CX, Lu ZT, et al. One-pass multi-task convolutional neural networks for efficient brain tumor segmentation[M]//Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. Cham: Springer International Publishing, 2018: 637-45. doi：10.1007/978-3-030-00931-1_73
21	吴雪扬, 张煜, 张华, 等. 基于注意力机制和多模态特征融合的猕猴脑磁共振图像全脑分割[J]. 南方医科大学学报, 2023, 43(12): 2118-25. doi：10.12122/j.issn.1673-4254.2023.12.17
22	何强, 王学涛, 李欣, 等. 基于多模态特征和多分类器融合的前列腺癌放疗中直肠并发症预测模型[J]. 南方医科大学学报, 2019, 39(8): 972-9.
23	Peng ZL, Huang W, Gu SZ, et al. Conformer: local features coupling global representations for visual recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 357-66. doi：10.1109/iccv48922.2021.00042
24	Wang H, Chen YH, Ma CB, et al. Multi-modal learning with missing modality via shared-specific feature modelling[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 15878-87. doi：10.1109/cvpr52729.2023.01524
25	Ma MM, Ren J, Zhao L, et al. SMIL: multimodal learning with severely missing modality[J]. Proc AAAI Conf Artif Intell, 2021, 35(3): 2302-10. doi：10.1609/aaai.v35i3.16330
26	Ma MM, Ren J, Zhao L, et al. Are multimodal transformers robust to missing modality? [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 18156-65. doi：10.1109/cvpr52688.2022.01764
27	Shen Y, Gao MC. Brain tumor segmentation on MRI with missing modalities[M]//Information Processing in Medical Imaging. Cham: Springer International Publishing, 2019: 417-28. doi：10.1007/978-3-030-20351-1_32
28	Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 7482-91. doi：10.1109/cvpr.2018.00781
29	He Y, Pan I, Bao BT, et al. Deep learning-based classification of primary bone tumors on radiographs: a preliminary study[J]. EBioMedicine, 2020, 62: 103121. doi：10.1016/j.ebiom.2020.103121
30	Eweje FR, Bao BT, Wu J, et al. Deep learning for classification of bone lesions on routine MRI[J]. EBioMedicine, 2021, 68: 103402. doi：10.1016/j.ebiom.2021.103402
31	Hakim DN, Pelly T, Kulendran M, et al. Benign tumours of the bone: a review[J]. J Bone Oncol, 2015, 4(2): 37-41. doi：10.1016/j.jbo.2015.02.001