基于Swin-ResViT网络的低质量动态cine-MR至高质量定位MR图像实时生成研究

doi:10.12122/j.issn.1673-4254.2026.04.21

摘要/Abstract

摘要：

目的探索基于Swin-ResViT网络从动态cine-MR生成高质量治疗前定位MR（sMR），提升实时影像的信噪比和对比度。方法提出一种融合Swin Transformer模块的ResViT模型（Swin-ResViT），通过优化瓶颈层结构以提升特征提取效率。回顾性收集2024年2~7月在中山大学肿瘤防治中心接受治疗的17例肝癌患者数据，其中12例肝癌患者的治疗中cine-MR和治疗前定位MR作为训练集，5例患者为测试集。通过量化sMR与参考定位MR的归一化均方根误差（NRMSE）、峰值信噪比（PSNR）、结构相似性指标（SSIM）、运动标记点误差以及模型推理速度，综合评估图像生成质量和模型性能。结果生成图像质量方面，Swin-ResViT生成的sMR相较于原始cine-MR，NRMSE、LPIPS分别下降约90%、82%（P<0.001）；PSNR、SSIM、CNR分别提升约157%、79%、181%（P<0.001）。结构准确性方面，动态sMR序列中右肝叶肝膈交界处运动标记点的平均定位误差为0.7695±0.7294 mm（P<0.05）。模型推理速度方面，对于224×224像素的单帧图像，在NVIDIA GeForce RTX 2080 Ti GPU上Swin-ResViT的平均处理时间为15.5 ms，对比标准ResViT为41.4 ms，减少了约62%。结论 Swin-ResViT模型能从cine-MR合成高质量sMR，该方法兼具高效计算与显著图像增强优势，对实时MRgRT具有重要临床意义。

关键词: 磁共振引导放射治疗, Transformer, cine-MR, 合成MR, 深度学习

Abstract:

Objective To obtain high-quality pre-treatment localization MR (sMR) images from dynamic cine-MR using the Swin-ResViT network for target tracking in MRgRT. Methods We propose a ResViT model fused with a Swin Transformer module (Swin-ResViT) with an optimized bottleneck layer structure for enhancing feature extraction efficiency. Seventeen liver cancer patients were retrospectively enrolled from Sun Yat-sen University Cancer Center from February to July 2024, and 12 of them were assigned to the training set (using intra-treatment cine-MR and pre-treatment planning MR), with the remaining 5 patients as the test set. Image generation quality and model performance were comprehensively evaluated by quantifying the normalized root mean square error (NRMSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), motion marker point error, and model inference speed between sMR and reference localization MR. Results Regarding image quality, Swin-ResViT reduced NRMSE and LPIPS by 90% and 82% compared to cine-MR (P<0.001), and improved PSNR, SSIM, and CNR by 157%, 79%, and 181% (P<0.001), respectively. Regarding structural accuracy, the mean localization error of motion markers at the right hepatophrenic junction in the generated dynamic sMR sequences was 0.7695±0.7294 mm (P<0.05). Regarding model inference speed, for a single 224×224-pixel frame, the average processing time on an NVIDIA GeForce RTX 2080 Ti GPU was 15.5 ms for Swin-ResViT as compared with 41.4 ms for the ResViT network, demonstrating a 62% reduction. Conclusion The Swin-ResViT model can synthesize high-quality sMR from cine-MR images. This method combines computational efficiency with significant image enhancement advantages, and thus has important clinical significance for real-time MRgRT.

Key words: MRgRT, Transformer, cine-MR, synthetic MR images, deep learning

陈博湧, 汪新怡, 赵新新, 宋婷, 李永宝. 基于Swin-ResViT网络的低质量动态cine-MR至高质量定位MR图像实时生成研究[J]. 南方医科大学学报, 2026, 46(4): 929-938.

Boyong CHEN, Xinyi WANG, Xinxin ZHAO, Ting SONG, Yongbao LI. Swin-ResViT network for real-time generation of high-quality localization MR images from low-quality cine-MR[J]. Journal of Southern Medical University, 2026, 46(4): 929-938.

图/表 11

图1 Swin-ResViT的网络结构示意图

Fig.1 Schematic diagram of the structure of the Swin-ResViT network. A: Structure of the network. B: Structure of the Swin Transformer block.

图2 2例肝细胞癌患者的cine-MR、合成sMR和reference MR图像

Fig.2 Cine-MR, synthesized MR (sMR), and localization MR reference slice (reference MR) images from two hepatocellular carcinoma patients. A: Comparison of tumor boundary contrast in images. The white boxes indicate the magnified regions. The red contours outline the gross tumor volume (GTV). B: Comparison of image artifacts. The red arrows highlight regions with anatomical blurring, residual artifacts, or texture distortion in baseline methods, and Swin-ResViT preserved these details more accurately.

表1 sMR、cine-MR与reference MR的图像质量及模型推理速度定量对比

Tab.1 Quantitative comparison of image quality between sMR, cine-MR and reference MR images and latencies of different models (Mean±SD)

Methods	NRMSE	PSNR	SSIM	CNR	LPIPS	Latency (ms)
cine-MR	0.1990±0.0305	14.2808±1.3035	0.5434±0.0436	0.3128±0.1939	0.3602±0.0076	-
CycleGAN	0.1335±0.0444	18.2640±2.9407	0.6918±0.1046	0.3322±0.1217	0.3471±0.0122	15.6
Pix2Pix	0.1139±0.0239	19.6097±2.0893	0.7497±0.0372	0.3652±0.1428	0.2877±0.0109	15.9
cDDPM	0.0350±0.0172	29.9420±3.789	0.8885±0.0652	0.5826±0.1667	0.0359±0.0162	20026.9
ResViT	0.0238±0.0172	34.4067±3.8455	0.9623±0.0308	0.8472±0.0513	0.0581±0.0154	41.4
Swin-ResViT	0.0199±0.0181	36.7590±5.4363	0.9737±0.0356	0.8795±0.0489	0.0656±0.0181	15.5

表2 Swin-ResViT 4折交叉验证结果

Tab.2 Results of 4-fold cross-validation of Swin-ResViT

Swin-ResViT	NRMSE	PSNR	SSIM
Fold 1	0.0155	37.4514	0.9865
Fold 2	0.0168	37.7475	0.9840
Fold 3	0.0042	42.3498	0.9985
Fold 4	0.0439	29.7120	0.9274
Average	0.0201±0.0168	36.8151±5.2396	0.9741±0.0317

图3 cine-MR序列及sMR序列图像

Fig.3 cine-MR sequence and sMR sequence images. The red dashed line is the reference line.

图4 标记点的运动轨迹与误差分析

Fig.4 Motion trajectory and position error of landmarks. A: Motion trajectory of landmarks. B: Position errors of the reference MR and sMR relative to the cine-MR.

表3 不同损失函数组合下的模型性能对比

Tab.3 Model performance comparison with different loss function combinations (Mean±SD)

Loss	NRMSE	PSNR	SSIM
MAE	0.0270±0.0126	32.6261±3.8100	0.9695±0.0252
MAE+grad loss	0.0209±0.0135	35.5751±5.9679	0.9780±0.0244
Proposed loss	0.0199±0.0181	36.7590±5.4363	0.9737±0.0356

表4 不同数量跳跃连接的模型性能对比

Tab.4 Model performance comparison with different numbers of skip connections (Mean±SD)

Skip connection	NRMSE	PSNR	SSIM
0	0.0238±0.0128	35.8764±5.8769	0.9684±0.0242
1	0.0232±0.0126	35.9615±5.7804	0.9709±0.0241
1, 2	0.0228±0.0127	35.9133±5.8754	0.9718±0.0236
1, 2, 3	0.0199±0.0181	36.7590±5.4363	0.9737±0.0356

表5 不同网络架构的模型性能对比

Tab.5 Model performance comparison with different network architecture (Mean±SD)

Model variant	NRMSE	PSNR	SSIM	Latency (ms)
Baseline 1	0.0238±0.0128	35.8764±5.8769	0.9684±0.0242	10.7
Baseline 2	0.0221±0.0163	34.7482±3.8342	0.9654±0.0238	43.2
Swin-ResViT	0.0199±0.0181	36.7590±5.4363	0.9737±0.0356	15.5

图5 不同网络架构的定性消融结果

Fig.5 Qualitative ablation results of different network variants. The red dashed boxes indicate the magnified regions.

图6 基于Grad-CAM的模型注意力热力图可视化对比

Fig.6 Visualization comparison of model attention heatmaps based on Grad-CAM.

参考文献 32

[1]	田静, 韩丹, 周涛. 肿瘤放射治疗技术的发展及应用研究[J]. 中国医刊, 2022, 57(10): 1064-7. doi：10.3969/j.issn.1008-1070.2022.10.006
[2]	Raaymakers BW, Jürgenliemk-Schulz IM, Bol GH, et al. First patients treated with a 1.5 T MRI-Linac: clinical proof of concept of a high-precision, high-field MRI guided radiotherapy treatment[J]. Phys Med Biol, 2017, 62(23): L41-50. doi：10.1088/1361-6560/aa9517
[3]	Raaymakers BW, Lagendijk JJW, Overweg J, et al. Integrating a 1.5 T MRI scanner with a 6 MV accelerator: proof of concept[J]. Phys Med Biol, 2009, 54(12): N229-37. doi：10.1088/0031-9155/54/12/n01
[4]	Lombardo E, Dhont J, Page D, et al. Real-time motion management in MRI-guided radiotherapy: Current status and AI-enabled prospects[J]. Radiother Oncol, 2024, 190: 109970. doi：10.1016/j.radonc.2023.109970
[5]	Paganelli C, Whelan B, Peroni M, et al. MRI-guidance for motion management in external beam radiotherapy: current status and future challenges[J]. Phys Med Biol, 2018, 63(22): 22TR03. doi：10.1088/1361-6560/aaebcf
[6]	Van Reeth E, Tham IWK, Tan CH, et al. Super-resolution in magnetic resonance imaging: a review[J]. Concepts Magn Reson Part A, 2012, 40A(6): 306-25. doi：10.1002/cmr.a.21249
[7]	Mo YJ, Wu Y, Yang XN, et al. Review the state-of-the-art technologies of semantic segmentation based on deep learning[J]. Neurocomputing, 2022, 493: 626-46. doi：10.1016/j.neucom.2022.01.005
[8]	Lepcha DC, Goyal B, Dogra A, et al. Image super-resolution: a comprehensive review, recent trends, challenges and applications[J]. Inf Fusion, 2023, 91: 230-60. doi：10.1016/j.inffus.2022.10.007
[9]	Dong YY, Yang F, Wen J, et al. Improvement of 2D cine image quality using 3D priors and cycle generative adversarial network for low field MRI-guided radiation therapy[J]. Med Phys, 2024, 51(5): 3495-509. doi：10.1002/mp.16860
[10]	Xie HQ, Lei Y, Wang TH, et al. Synthesizing high-resolution magnetic resonance imaging using parallel cycle-consistent generative adversarial networks for fast magnetic resonance imaging[J]. Med Phys, 2022, 49(1): 357-69. doi：10.1002/mp.15380
[11]	You A, Kim JK, Ryu IH, et al. Application of generative adversarial networks (GAN) for ophthalmology image domains: a survey[J]. Eye Vis, 2022, 9(1): 6. doi：10.1186/s40662-022-00277-3
[12]	Rabbi J, Ray N, Schubert M, et al. Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network[J]. Remote Sens, 2020, 12(9): 1432. doi：10.3390/rs12091432
[13]	Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. 2015: arXiv: 1511.06434.
[14]	Zhang H, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks[EB/OL]. 2018: arXiv: 1805.08318. . doi：10.48550/arXiv.1805.08318
[15]	Chun J, Zhang H, Gach HM, et al. MRI super-resolution reconstruction for MRI-guided adaptive radiotherapy using cascaded deep learning: in the presence of limited training data and unknown translation model[J]. Med Phys, 2019, 46(9): 4148-64. doi：10.1002/mp.13717
[16]	Huang BY, Xiao HN, Liu WW, et al. MRI super-resolution via realistic downsampling with adversarial learning[J]. Phys Med Biol, 2021, 66(20). DOI:10.1088/1361-6560/ac232e .
[17]	Yoon YH, Chun J, Kiser K, et al. Inter-scanner super-resolution of 3D cine MRI using a transfer-learning network for MRgRT[J]. Phys Med Biol, 2024, 69(11). DOI:10.1088/1361-6560/ad43ab .
[18]	Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[EB/OL]. 2020: arXiv: 2006.11239.
[19]	Saharia C, Chan W, Chang HW, et al. Palette: image-to-image diffusion models[C]//Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings. Vancouver BC Canada. ACM, 2022: 1-10. doi：10.1145/3528233.3530757
[20]	Chen XQ, Qiu RLJ, Peng JB, et al. CBCT-based synthetic CT image generation using a diffusion model for CBCT-guided lung radiotherapy[J]. Med Phys, 2024, 51(11): 8168-78. doi：10.1002/mp.17328
[21]	Liu Z, Lin YT, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2022: 9992-10002. doi：10.1109/iccv48922.2021.00986
[22]	Dalmaz O, Yurt M, Çukur T. ResViT: residual vision transformers for multimodal medical image synthesis[J]. IEEE Trans Med Imag, 2022, 41(10): 2598-614. doi：10.1109/tmi.2022.3167808
[23]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. doi：10.3390/rs9080848
[24]	Zhu JY, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). October 22-29, 2017, Venice, Italy. IEEE, 2017: 2242-51. doi：10.1109/iccv.2017.244
[25]	Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer, 2015: 234-41. doi：10.1007/978-3-319-24574-4_28
[26]	Hu H, Gu JY, Zhang Z, et al. Relation networks for object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 3588-97. doi：10.1109/cvpr.2018.00378
[27]	Hu H, Zhang Z, Xie ZD, et al. Local relation networks for image recognition[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019. Seoul, Korea. IEEE, 2019: 3463-72. doi：10.1109/iccv.2019.00356
[28]	Thorwarth D. Functional imaging for radiotherapy treatment planning: current status and future directions-a review[J]. Br J Radiol, 2015, 88(1051): 20150056. doi：10.1259/bjr.20150056
[29]	Galić I, Habijan M, Leventić H, et al. Machine learning empowering personalized medicine: a comprehensive review of medical image analysis methods[J]. Electronics, 2023, 12(21): 4411. doi：10.3390/electronics12214411
[30]	Huynh E, Hosny A, Guthier C, et al. Artificial intelligence in radiation oncology[J]. Nat Rev Clin Oncol, 2020, 17(12): 771-81. doi：10.1038/s41571-020-0417-8
[31]	Kazerouni A, Aghdam EK, Heidari M, et al. Diffusion models in medical imaging: a comprehensive survey[J]. Med Image Anal, 2023, 88: 102846. doi：10.1016/j.media.2023.102846
[32]	Wendling M, Morrow A, Hoggarth M. An efficient protocol for radiotherapy quality control with machine learning[J]. Med Phys, 2020, 47(4): 1526-34.