基于特征解耦与融合的不完全多模态骨肿瘤图像分类

doi:10.12122/j.issn.1673-4254.2025.06.22

摘要/Abstract

摘要：

目的提出了一种基于特征解耦与融合的骨肿瘤分类模型，用于合理处理模态缺失并融合多模态信息，以提升分类准确率。方法设计解耦补全模块，先提取包含已有模态的局部与全局信息的骨肿瘤图像特征，再将该特征分解为共享特征和特定特征。利用共享特征作为缺失模态特征的补全表示，从而减少因模态差异带来的补全偏差。考虑到模态差异可能会使多模态信息难以融合，采用基于交叉注意力机制的融合模块。提升模型学习跨模态信息的能力并对特定特征进行充分融合，从而提高骨肿瘤分类的准确性。结果实验采用在南方医科大学第三附属医院收集的骨肿瘤数据集进行训练和测试。在7种可用模态组合中，本文方法中骨肿瘤分类的平均AUC、准确率、特异性分别为0.766、0.621、0.793，与现有的模态缺失处理方法相比分别提高了2.6%、3.5%、1.7%。全模态情况下骨肿瘤分类效果最佳，AUC为0.837；仅有MRI模态时AUC仍能达到0.826。结论本文方法能合理地处理模态缺失并有效融合多模态信息，在多种复杂的缺失情境下表现出良好的骨肿瘤分类性能。

关键词: 骨肿瘤分类, 多模态图像, 模态缺失, 特征解耦, 注意力融合

Abstract:

Objective To construct a bone tumor classification model based on feature decoupling and fusion for processing modality loss and fusing multimodal information to improve classification accuracy. Methods A decoupling completion module was designed to extract local and global bone tumor image features from available modalities. These features were then decomposed into shared and modality-specific features, which were used to complete the missing modality features, thereby reducing completion bias caused by modality differences. To address the challenge of modality differences that hinder multimodal information fusion, a cross-attention-based fusion module was introduced to enhance the model's ability to learn cross-modal information and fully integrate specific features, thereby improving the accuracy of bone tumor classification. Results The experiment was conducted using a bone tumor dataset collected from the Third Affiliated Hospital of Southern Medical University for training and testing. Among the 7 available modality combinations, the proposed method achieved an average AUC, accuracy, and specificity of 0.766, 0.621, and 0.793, respectively, which represent improvements of 2.6%, 3.5%, and 1.7% over existing methods for handling missing modalities. The best performance was observed when all the modalities were available, resulting in an AUC of 0.837, which still reached 0.826 even with MRI alone. Conclusion The proposed method can effectively handle missing modalities and successfully integrate multimodal information, and show robust performance in bone tumor classification under various complex missing modality scenarios.

Key words: bone tumor classification, multimodal imaging, modality missing, feature decoupling, attention fusion

曾青海, 李传璞, 阳维, 宋丽文, 赵英华, 杨谊. 基于特征解耦与融合的不完全多模态骨肿瘤图像分类[J]. 南方医科大学学报, 2025, 45(6): 1327-1335.

Qinghai ZENG, Chuanpu LI, Wei YANG, Liwen SONG, Yinghua ZHAO, Yi YANG. Incomplete multimodal bone tumor image classification based on feature decoupling and fusion[J]. Journal of Southern Medical University, 2025, 45(6): 1327-1335.

图/表 11

图1 不完全多模态骨肿瘤图像

Fig.1 Incomplete multimodal bone tumor images. A: Benign. B: Intermediate. C: Malignant.

表1 图像模态分布

Tab.1 Image modal distribution

CT (658)	MRI (547)	X (850)	Total (1043)
√	√	√	314
√	√		106
√		√	58
	√	√	180
√			29
	√		146
		√	210

图2 本文方法整体框架

Fig.2 Overall framework of the proposed method.

图3 跨模态注意力融合

Fig.3 Cross modal attention fusion.

表2 不完全多模态学习方法的对比实验结果

Tab.2 Comparative experiment results of incomplete multimodal learning methods

Modalities				Methods
CT	MRI	X		Late-fusion	SIML	mmformer	Multi-task	ACNet	Ours
			AUC	0.682	0.737	0.785	0.795	0.811	0.837
√	√	√	Acc	0.524	0.597	0.625	0.661	0.637	0.721
			Spe	0.722	0.787	0.806	0.815	0.809	0.847
			AUC	0.675	0.750	0.774	0.795	0.800	0.816
√	√		Acc	0.540	0.621	0.606	0.694	0.629	0.683
			Spe	0.737	0.795	0.793	0.826	0.800	0.833
			AUC	0.592	0.641	0.648	0.684	0.701	0.745
√		√	Acc	0.435	0.508	0.481	0.538	0.379	0.556
			Spe	0.670	0.722	0.702	0.745	0.689	0.758
			AUC	0.683	0.738	0.824	0.805	0.780	0.831
	√	√	Acc	0.540	0.605	0.606	0.661	0.444	0.663
			Spe	0.750	0.796	0.810	0.823	0.719	0.825
			AUC	0.593	0.642	0.637	0.702	0.700	0.686
√			Acc	0.444	0.540	0.500	0.548	0.379	0.577
			Spe	0.679	0.740	0.710	0.749	0.687	0.764
			AUC	0.667	0.743	0.806	0.775	0.760	0.826
	√		Acc	0.524	0.621	0.587	0.645	0.452	0.702
			Spe	0.750	0.802	0.804	0.809	0.722	0.840
			AUC	0.555	0.499	0.587	0.549	0.629	0.624
		√	Acc	0.444	0.411	0.394	0.355	0.355	0.442
			Spe	0.668	0.670	0.665	0.667	0.667	0.681
Average			AUC	0.635	0.678	0.723	0.729	0.740	0.766
			Acc	0.493	0.558	0.543	0.586	0.468	0.621
			Spe	0.711	0.759	0.756	0.776	0.727	0.793

表3 模型在内部和外部测试集上的分类性能对比

Tab.3 Comparison of classification performance of the models on internal and external test sets

AUC			Internal test set		External test set
CT	MRI	X	ACNet	Ours	ACNet	Ours
√	√	√	0.811	0.837	0.797	0.824
√	√		0.800	0.816	0.763	0.820
√		√	0.701	0.745	0.720	0.765
	√	√	0.780	0.831	0.731	0.737
√			0.700	0.686	0.689	0.740
	√		0.760	0.826	0.677	0.684
		√	0.629	0.624	0.623	0.662
Average			0.740	0.766	0.714	0.748

图4 模型判断肿瘤类别的实例

Fig.4 Example of model judging tumor category.

图5 全模态输入时的ROC 曲线

Fig.5 ROC curve under full modal input. A: ROC curves of different models. B: ROC curves of different types of bone tumors in our model.

表4 模型对不同位置骨肿瘤的分类结果

Tab.4 Classification performance of models for classifying bone tumors in different locations

Location	AUC	Acc	Spe
Torso or head	0.677	0.583	0.810
Extremities	0.871	0.739	0.854

表5 消融实验结果

Tab.5 Results of ablation study

Modalities			AUC
CT	MRI	X	w/o DCM	w/o CMA	Overall Model
√	√	√	0.777	0.805	0.837
√	√		0.777	0.806	0.816
√		√	0.670	0.706	0.745
	√	√	0.757	0.818	0.831
√			0.677	0.693	0.686
	√		0.748	0.820	0.826
		√	0.604	0.574	0.624
Average			0.716	0.746	0.766

表6 不同超参数下分类模型的性能

Tab.6 Performance of classification models under different hyperparameters

Hyper-parameters	AUC	Acc	Spe
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 0.2	0.823	0.663	0.824
$λ 1$ = 0.2、 $λ 2$ = 0.3、 $λ 3$ = 0.1; $α$ = 0.2; $β$ = 0.2	0.812	0.711	0.841
$λ 1$ = $λ 2$ = $λ 3$ = 1; $α$ = 0.2; $β$ = 0.2	0.824	0.654	0.826
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 1; $β$ = 0.2	0.808	0.625	0.842
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 1	0.830	0.682	0.834
Automatic weighted^[28]	0.837	0.721	0.847

表6 不同超参数下分类模型的性能

Tab.6 Performance of classification models under different hyperparameters

Hyper-parameters	AUC	Acc	Spe
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 0.2	0.823	0.663	0.824
$λ 1$ = 0.2、 $λ 2$ = 0.3、 $λ 3$ = 0.1; $α$ = 0.2; $β$ = 0.2	0.812	0.711	0.841
$λ 1$ = $λ 2$ = $λ 3$ = 1; $α$ = 0.2; $β$ = 0.2	0.824	0.654	0.826
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 1; $β$ = 0.2	0.808	0.625	0.842
$λ 1$ = $λ 2$ = $λ 3$ = 0.2; $α$ = 0.2; $β$ = 1	0.830	0.682	0.834
Automatic weighted^[28]	0.837	0.721	0.847

参考文献 31

1	Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020[J]. CA A Cancer J Clin, 2020, 70(1): 7-30. doi：10.3322/caac.21590
2	姜兆侯, 司建荣, 张雅丽. 骨肿瘤的比较影像学思考[J]. 临床放射学杂志, 2002, 21(2): 89-92. doi：10.3969/j.issn.1001-9324.2002.02.001
3	李少利, 董颖, 袁瑛. 2023年第2版《NCCN恶性骨肿瘤临床实践指南》更新解读[J]. 实用肿瘤杂志, 2023, 38(1): 1-4. doi：10.13267/j.cnki.syzlzz.2023.001
4	Choi JH, Ro JY. The 2020 WHO classification of tumors of soft tissue: selected changes and new entities[J]. Adv Anat Pathol, 2021, 28(1): 44-58. doi：10.1097/PAP.0000000000000284
5	Winn RJ, McClure J. The NCCN clinical practice guidelines in oncology: a primer for users[J]. J Natl Compr Canc Netw, 2003, 1(1): 5-13. doi：10.6004/jnccn.2003.0003
6	Do BH, Langlotz C, Beaulieu CF. Bone tumor diagnosis using a Naïve Bayesian model of demographic and radiographic features[J]. J Digit Imaging, 2017, 30(5): 640-7. doi：10.1007/s10278-017-0001-7
7	Board W. Soft tissue and bone tumours[J]. Inte Agen Res Cancer: Lyon, France, 2020: 472-4.
8	魏清柱. 第五版 WHO骨肿瘤分类[J]. 诊断病理学杂志, 2021, 28(6): 497-8.
9	Gianferante DM, Mirabello L, Savage SA. Germline and somatic genetics of osteosarcoma: connecting aetiology, biology and therapy[J]. Nat Rev Endocrinol, 2017, 13(8): 480-91. doi：10.1038/nrendo.2017.16
10	Fritzsche H, Schaser KD, Hofbauer C. Benign tumours and tumour-like lesions of the bone: general treatment principles[J]. Orthopade, 2017, 46(6): 484-97. doi：10.1007/s00132-017-3429-z
11	Gutowski CJ, Basu-Mallick A, Abraham JA. Management of bone sarcoma[J]. Surg Clin North Am, 2016, 96(5): 1077-106. doi：10.1016/j.suc.2016.06.002
12	Gaume M, Chevret S, Campagna R, et al. The appropriate and sequential value of standard radiograph, computed tomography and magnetic resonance imaging to characterize a bone tumor[J]. Sci Rep, 2022, 12(1): 6196. doi：10.1038/s41598-022-10218-8
13	Dorent R, Joutard S, Modat M, et al. Hetero-modal variational encoder-decoder for joint modality completion and segmentation[M]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2019. Cham: Springer International Publishing, 2019: 74-82. doi：10.1007/978-3-030-32245-8_9
14	Ding YH, Yu X, Yang Y. RFNet: region-aware fusion network for incomplete multi-modal brain tumor segmentation[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 3955-64. doi：10.1109/iccv48922.2021.00394
15	Zhang Y, He NJ, Yang JW, et al. mmFormer: multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation[M]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2022. Cham: Springer Nature Switzerland, 2022: 107-17. doi：10.1007/978-3-031-16443-9_11
16	Guo WK, Huang HB, Kong XW, et al. Learning disentangled representation for cross-modal retrieval with deep mutual information estimation[C]//Proceedings of the 27th ACM International Conference on Multimedia. Nice France. ACM, 2019: 1712-1720. doi：10.1145/3343031.3351053
17	Lu Y, Wu Y, Liu B, et al. Cross-modality person re-identification with shared-specific feature transfer[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020. Seattle, WA, USA. IEEE, 2020: 13376-86. doi：10.1109/cvpr42600.2020.01339
18	Yang HR, Sun J, Xu ZB. Learning unified hyper-network for multi-modal MR image synthesis and tumor segmentation with missing modalities[J]. IEEE Trans Med Imaging, 2023, 42(12): 3678-89. doi：10.1109/tmi.2023.3301934
19	Tseng KL, Lin YL, Hsu W, et al. Joint sequence learning and cross-modality convolution for 3D biomedical segmentation[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 3739-46. doi：10.1109/CVPR.2017.398
20	Zhou CH, Ding CX, Lu ZT, et al. One-pass multi-task convolutional neural networks for efficient brain tumor segmentation[M]//Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. Cham: Springer International Publishing, 2018: 637-45. doi：10.1007/978-3-030-00931-1_73
21	吴雪扬, 张煜, 张华, 等. 基于注意力机制和多模态特征融合的猕猴脑磁共振图像全脑分割[J]. 南方医科大学学报, 2023, 43(12): 2118-25. doi：10.12122/j.issn.1673-4254.2023.12.17
22	何强, 王学涛, 李欣, 等. 基于多模态特征和多分类器融合的前列腺癌放疗中直肠并发症预测模型[J]. 南方医科大学学报, 2019, 39(8): 972-9.
23	Peng ZL, Huang W, Gu SZ, et al. Conformer: local features coupling global representations for visual recognition[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 357-66. doi：10.1109/iccv48922.2021.00042
24	Wang H, Chen YH, Ma CB, et al. Multi-modal learning with missing modality via shared-specific feature modelling[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 15878-87. doi：10.1109/cvpr52729.2023.01524
25	Ma MM, Ren J, Zhao L, et al. SMIL: multimodal learning with severely missing modality[J]. Proc AAAI Conf Artif Intell, 2021, 35(3): 2302-10. doi：10.1609/aaai.v35i3.16330
26	Ma MM, Ren J, Zhao L, et al. Are multimodal transformers robust to missing modality? [C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 18156-65. doi：10.1109/cvpr52688.2022.01764
27	Shen Y, Gao MC. Brain tumor segmentation on MRI with missing modalities[M]//Information Processing in Medical Imaging. Cham: Springer International Publishing, 2019: 417-28. doi：10.1007/978-3-030-20351-1_32
28	Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 7482-91. doi：10.1109/cvpr.2018.00781
29	He Y, Pan I, Bao BT, et al. Deep learning-based classification of primary bone tumors on radiographs: a preliminary study[J]. EBioMedicine, 2020, 62: 103121. doi：10.1016/j.ebiom.2020.103121
30	Eweje FR, Bao BT, Wu J, et al. Deep learning for classification of bone lesions on routine MRI[J]. EBioMedicine, 2021, 68: 103402. doi：10.1016/j.ebiom.2021.103402
31	Hakim DN, Pelly T, Kulendran M, et al. Benign tumours of the bone: a review[J]. J Bone Oncol, 2015, 4(2): 37-41. doi：10.1016/j.jbo.2015.02.001