南方医科大学学报 ›› 2026, Vol. 46 ›› Issue (4): 929-938.doi: 10.12122/j.issn.1673-4254.2026.04.21

• • 上一篇    

基于Swin-ResViT网络的低质量动态cine-MR至高质量定位MR图像实时生成研究

陈博湧1(), 汪新怡1, 赵新新1, 宋婷1(), 李永宝2()   

  1. 1.南方医科大学生物医学工程学院,广东 广州 510515
    2.中山大学肿瘤防治中心//华南恶性肿瘤防治全国重点实验室//肿瘤医学协同创新中心,广东 广州 510060
  • 收稿日期:2025-10-22 出版日期:2026-04-20 发布日期:2026-04-24
  • 通讯作者: 宋婷,李永宝 E-mail:q13729335350@smu.edu.cn;tingsong2015@smu.edu.cn;liyb1@sysucc.org.cn
  • 作者简介:陈博湧,在读硕士研究生,E-mail: q13729335350@smu.edu.cn
  • 基金资助:
    国家自然科学基金(82472117);广东省基础与应用基础研究基金(2024A1515010820);广东省基础与应用基础研究基金(2024A1515011831)

Swin-ResViT network for real-time generation of high-quality localization MR images from low-quality cine-MR

Boyong CHEN1(), Xinyi WANG1, Xinxin ZHAO1, Ting SONG1(), Yongbao LI2()   

  1. 1.School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
    2.State Key Laboratory of Oncology in South China, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
  • Received:2025-10-22 Online:2026-04-20 Published:2026-04-24
  • Contact: Ting SONG, Yongbao LI E-mail:q13729335350@smu.edu.cn;tingsong2015@smu.edu.cn;liyb1@sysucc.org.cn
  • Supported by:
    National Natural Science Foundation of China(82472117)

摘要:

目的 探索基于Swin-ResViT网络从动态cine-MR生成高质量治疗前定位MR(sMR),提升实时影像的信噪比和对比度。 方法 提出一种融合Swin Transformer模块的ResViT模型(Swin-ResViT),通过优化瓶颈层结构以提升特征提取效率。回顾性收集2024年2~7月在中山大学肿瘤防治中心接受治疗的17例肝癌患者数据,其中12例肝癌患者的治疗中cine-MR和治疗前定位MR作为训练集,5例患者为测试集。通过量化sMR与参考定位MR的归一化均方根误差(NRMSE)、峰值信噪比(PSNR)、结构相似性指标(SSIM)、运动标记点误差以及模型推理速度,综合评估图像生成质量和模型性能。 结果 生成图像质量方面,Swin-ResViT生成的sMR相较于原始cine-MR,NRMSE、LPIPS分别下降约90%、82%(P<0.001);PSNR、SSIM、CNR分别提升约157%、79%、181%(P<0.001)。结构准确性方面,动态sMR序列中右肝叶肝膈交界处运动标记点的平均定位误差为0.7695±0.7294 mm(P<0.05)。模型推理速度方面,对于224×224像素的单帧图像,在NVIDIA GeForce RTX 2080 Ti GPU上Swin-ResViT的平均处理时间为15.5 ms,对比标准ResViT为41.4 ms,减少了约62%。 结论 Swin-ResViT模型能从cine-MR合成高质量sMR,该方法兼具高效计算与显著图像增强优势,对实时MRgRT具有重要临床意义。

关键词: 磁共振引导放射治疗, Transformer, cine-MR, 合成MR, 深度学习

Abstract:

Objective To obtain high-quality pre-treatment localization MR (sMR) images from dynamic cine-MR using the Swin-ResViT network for target tracking in MRgRT. Methods We propose a ResViT model fused with a Swin Transformer module (Swin-ResViT) with an optimized bottleneck layer structure for enhancing feature extraction efficiency. Seventeen liver cancer patients were retrospectively enrolled from Sun Yat-sen University Cancer Center from February to July 2024, and 12 of them were assigned to the training set (using intra-treatment cine-MR and pre-treatment planning MR), with the remaining 5 patients as the test set. Image generation quality and model performance were comprehensively evaluated by quantifying the normalized root mean square error (NRMSE), peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), motion marker point error, and model inference speed between sMR and reference localization MR. Results Regarding image quality, Swin-ResViT reduced NRMSE and LPIPS by 90% and 82% compared to cine-MR (P<0.001), and improved PSNR, SSIM, and CNR by 157%, 79%, and 181% (P<0.001), respectively. Regarding structural accuracy, the mean localization error of motion markers at the right hepatophrenic junction in the generated dynamic sMR sequences was 0.7695±0.7294 mm (P<0.05). Regarding model inference speed, for a single 224×224-pixel frame, the average processing time on an NVIDIA GeForce RTX 2080 Ti GPU was 15.5 ms for Swin-ResViT as compared with 41.4 ms for the ResViT network, demonstrating a 62% reduction. Conclusion The Swin-ResViT model can synthesize high-quality sMR from cine-MR images. This method combines computational efficiency with significant image enhancement advantages, and thus has important clinical significance for real-time MRgRT.

Key words: MRgRT, Transformer, cine-MR, synthetic MR images, deep learning