南方医科大学学报 ›› 2025, Vol. 45 ›› Issue (2): 409-421.doi: 10.12122/j.issn.1673-4254.2025.02.22
• • 上一篇
收稿日期:
2024-10-30
出版日期:
2025-02-20
发布日期:
2025-03-03
通讯作者:
黄凌霄
E-mail:ran96822@stu.nxu.edu.cn;huanglx@nxu.edu.cn
作者简介:
任煜瀛,在读硕士研究生,E-mail: ran96822@stu.nxu.edu.cn
基金资助:
Yuying REN(), Lingxiao HUANG(
), Fang DU, Xinbo YAO
Received:
2024-10-30
Online:
2025-02-20
Published:
2025-03-03
Contact:
Lingxiao HUANG
E-mail:ran96822@stu.nxu.edu.cn;huanglx@nxu.edu.cn
Supported by:
摘要:
目的 针对皮肤病检测任务中存在皮肤病变区域多尺度、图像噪点干扰以及辅助诊疗设备资源有限影响检测准确性等问题,提出一种基于RT-DETR改进的高效轻量化皮肤病检测模型。 方法 引入轻量级FasterNet作为骨干网络,同时对FasterNetBlock模块进行重参数化改进。在颈部网络中引入卷积和注意力融合模块代替多头自注意力机制,形成AIFI-CAFM模块,从而增强模型捕获图像全局依赖关系和局部细节信息的能力。设计DRB-HSFPN特征金字塔网络替换跨尺度特征融合模块(CCFM),以融合不同尺度的上下文信息,提升颈部网络的语义特征表达能力。结合Inner-IoU和EIoU的优点,提出了Inner-EIoU替换原损失函数GIOU,进一步提高模型推理准确性和收敛速度。 结果 改进后的RT-DETR相较于原始模型,在HAM10000数据集上的mAP@50和mAP@50:95分别提升了4.5%和2.8%,检测速度FPS达到59.1帧/s。同时,改进模型的参数量为10.9M,计算量为19.3GFLOPs,相较于原始模型分别降低了46.0%和67.2%,验证了改进模型的有效性。 结论 本文提出的SD-DETR模型在降低参数量和计算量的同时,能够有效的提取并融合多尺度特征,从而显著提升了皮肤病检测任务的性能。
任煜瀛, 黄凌霄, 杜方, 姚新波. 基于改进RT-DETR的多尺度特征融合的高效轻量皮肤病理检测方法[J]. 南方医科大学学报, 2025, 45(2): 409-421.
Yuying REN, Lingxiao HUANG, Fang DU, Xinbo YAO. An efficient and lightweight skin pathology detection method based on multi-scale feature fusion using an improved RT-DETR model[J]. Journal of Southern Medical University, 2025, 45(2): 409-421.
Label Name | Mark the quantity of medical equipment | Number of images |
---|---|---|
MEL | 1123 | 1113 |
NV | 6761 | 6705 |
BCC | 517 | 514 |
AKIEC | 334 | 327 |
BKL | 1124 | 1099 |
DF | 116 | 115 |
VASC | 142 | 142 |
表1 数据集统计信息
Tab.1 Dataset statistical information
Label Name | Mark the quantity of medical equipment | Number of images |
---|---|---|
MEL | 1123 | 1113 |
NV | 6761 | 6705 |
BCC | 517 | 514 |
AKIEC | 334 | 327 |
BKL | 1124 | 1099 |
DF | 116 | 115 |
VASC | 142 | 142 |
Layer number and name | Output size | Layer structure |
---|---|---|
0-Embeddding | 160×160 | Conv 4×4, 40 |
1-Stage 1 | 160×160 | FasterNetRepBlock, 40 |
2-Merging | 80×80 | Conv 2×2, 80 |
3-Stage 2 | 80×80 | FasterNetRepBlock(Repeat 2 times), 80 |
4-Merging | 40×40 | Conv 2×2, 160, |
5-Stage 3 | 40×40 | FasterNetRepBlock(Repeat 8 times), 160 |
6-Merging | 20×20 | Conv 2×2, 320 |
7-Stage 4 | 20×20 | FasterNetRepBlock(Repeat 2 times), 320 |
表2 FasterNet主干网络结构
Tab.2 Main structure of the FasterNet backbone network
Layer number and name | Output size | Layer structure |
---|---|---|
0-Embeddding | 160×160 | Conv 4×4, 40 |
1-Stage 1 | 160×160 | FasterNetRepBlock, 40 |
2-Merging | 80×80 | Conv 2×2, 80 |
3-Stage 2 | 80×80 | FasterNetRepBlock(Repeat 2 times), 80 |
4-Merging | 40×40 | Conv 2×2, 160, |
5-Stage 3 | 40×40 | FasterNetRepBlock(Repeat 8 times), 160 |
6-Merging | 20×20 | Conv 2×2, 320 |
7-Stage 4 | 20×20 | FasterNetRepBlock(Repeat 2 times), 320 |
Configuration name | Model version |
---|---|
System environment | Ubuntu20.04.6 |
CPU | Intel(R) Xeon(R) Gold 5418Y |
GPU | NVIDIA GeForce RTX 4090 24GB |
CUDA | CUDA 11.3 |
RAM | 32 GB |
Deep learning framework | Pytorch 1.13.0 |
表3 实验环境配置
Tab.3 Experimental environment configuration
Configuration name | Model version |
---|---|
System environment | Ubuntu20.04.6 |
CPU | Intel(R) Xeon(R) Gold 5418Y |
GPU | NVIDIA GeForce RTX 4090 24GB |
CUDA | CUDA 11.3 |
RAM | 32 GB |
Deep learning framework | Pytorch 1.13.0 |
Experiment Number | Reparameterized FasterNet | AIFI-CAFM | DRB-HSFPN | mAP50(%) | Param(M) | FLOPs/G |
---|---|---|---|---|---|---|
1 | 49.3 | 20.2 | 58.8 | |||
2 | √ | 50.7 | 11.1 | 29.9 | ||
3 | √ | 51.6 | 20.9 | 59.7 | ||
4 | √ | 49.6 | 17.3 | 45.6 | ||
5 | √ | √ | 52.7 | 13.7 | 32.3 | |
6 | √ | √ | √ | 53.8 | 10.9 | 19.3 |
表4 消融实验结果
Tab.4 Ablation experiment results
Experiment Number | Reparameterized FasterNet | AIFI-CAFM | DRB-HSFPN | mAP50(%) | Param(M) | FLOPs/G |
---|---|---|---|---|---|---|
1 | 49.3 | 20.2 | 58.8 | |||
2 | √ | 50.7 | 11.1 | 29.9 | ||
3 | √ | 51.6 | 20.9 | 59.7 | ||
4 | √ | 49.6 | 17.3 | 45.6 | ||
5 | √ | √ | 52.7 | 13.7 | 32.3 | |
6 | √ | √ | √ | 53.8 | 10.9 | 19.3 |
Loss | P | R | mAP | mAP |
---|---|---|---|---|
GIoU | 69.7 | 53.2 | 52.8 | 42.5 |
CIoU | 62.1 | 51.9 | 52.4 | 42.2 |
DIoU | 68.7 | 54.5 | 52.7 | 42.7 |
SIoU | 69.5 | 54.2 | 53.1 | 43.0 |
EIoU | 66.7 | 57.3 | 53.3 | 43.1 |
Inner-EIoU | 71.9 | 55.2 | 53.8 | 43.3 |
表5 不同损失函数对比实验结果
Tab.5 Comparison of experimental results by introducing different loss functions (%)
Loss | P | R | mAP | mAP |
---|---|---|---|---|
GIoU | 69.7 | 53.2 | 52.8 | 42.5 |
CIoU | 62.1 | 51.9 | 52.4 | 42.2 |
DIoU | 68.7 | 54.5 | 52.7 | 42.7 |
SIoU | 69.5 | 54.2 | 53.1 | 43.0 |
EIoU | 66.7 | 57.3 | 53.3 | 43.1 |
Inner-EIoU | 71.9 | 55.2 | 53.8 | 43.3 |
Category | P | P | R | R | mAP | mAP |
---|---|---|---|---|---|---|
All | 64.6 | 71.9 | 53.4 | 55.2 | 49.3 | 53.8 |
MEL | 70.0 | 70.9 | 35.3 | 41.8 | 34.6 | 47.3 |
NV | 79.2 | 79.6 | 92.6 | 85.6 | 86.8 | 83.2 |
BCC | 52.6 | 68.1 | 58.5 | 58.8 | 42.3 | 51.8 |
AKIEC | 61.4 | 65.9 | 39.9 | 50.0 | 40.5 | 35.6 |
BKL | 41.5 | 67.3 | 45.9 | 50.2 | 31.6 | 47.4 |
DF | 59.9 | 75.5 | 59.6 | 57.1 | 54.7 | 64.1 |
VASC | 67.3 | 75.9 | 40.7 | 42.9 | 43.6 | 47.0 |
表6 模型改进前后在HAM10000上各类别的性能对比
Tab.6 Comparison of model performance across various categories on HAM10000 before and after improvements
Category | P | P | R | R | mAP | mAP |
---|---|---|---|---|---|---|
All | 64.6 | 71.9 | 53.4 | 55.2 | 49.3 | 53.8 |
MEL | 70.0 | 70.9 | 35.3 | 41.8 | 34.6 | 47.3 |
NV | 79.2 | 79.6 | 92.6 | 85.6 | 86.8 | 83.2 |
BCC | 52.6 | 68.1 | 58.5 | 58.8 | 42.3 | 51.8 |
AKIEC | 61.4 | 65.9 | 39.9 | 50.0 | 40.5 | 35.6 |
BKL | 41.5 | 67.3 | 45.9 | 50.2 | 31.6 | 47.4 |
DF | 59.9 | 75.5 | 59.6 | 57.1 | 54.7 | 64.1 |
VASC | 67.3 | 75.9 | 40.7 | 42.9 | 43.6 | 47.0 |
Model | Backbone | Param (M) | FLOPs (G) | mAP | mAP | FPS |
---|---|---|---|---|---|---|
Faster-RCNN[ | R50 | 137.1 | 370.2 | 39.3 | 25.5 | 26.6 |
YOLOv7[ | - | 36.5 | 104.7 | 44.3 | 33.9 | 53.7 |
YOLOv7-X | - | 70.8 | 188.1 | 48.6 | 37.5 | 22.3 |
YOLOv8-S[ | - | 11.2 | 28.6 | 47.3 | 38.3 | 61.3 |
YOLOv8-M | - | 26.9 | 79.1 | 48.2 | 39.2 | 46.2 |
YOLOv8-L | - | 43.7 | 165.1 | 49.6 | 41.1 | 31.4 |
YOLOv9-S[ | - | 7.1 | 26.2 | 46.3 | 37.4 | 28.0 |
YOLOv9-M | - | 20.1 | 76.9 | 50.1 | 37.3 | 38.1 |
GOLD-YOLO-S[ | - | 21.3 | 46.1 | 44.1 | 34.4 | 55.3 |
GOLD-YOLO-M | - | 41.2 | 87.3 | 46.2 | 35.9 | 37.4 |
Deformable-DETR[ | R50 | 39.8 | 172.9 | 45.0 | 34.6 | - |
DINO[ | R50 | 47.2 | 279.0 | 44.1 | 34.4 | 6.4 |
DAB-DETR[ | R50 | 35.2 | 210.0 | 46.9 | 37.9 | - |
Conditional-DETR[ | R50 | 44.0 | 86.3 | 45.3 | 33.9 | - |
RT-DETR | R18 | 20.2 | 58.8 | 49.3 | 40.5 | 40.2 |
RT-DETR | R34 | 31.4 | 88.6 | 50.1 | 41.9 | 33.1 |
RT-DETR | R50 | 40.3 | 134.8 | 50.8 | 42.8 | 29.8 |
SD-DETR | FasterNet | 10.9 | 19.3 | 53.8 | 43.3 | 59.1 |
表7 不同模型性能对比
Tab.7 Performance comparison of different models
Model | Backbone | Param (M) | FLOPs (G) | mAP | mAP | FPS |
---|---|---|---|---|---|---|
Faster-RCNN[ | R50 | 137.1 | 370.2 | 39.3 | 25.5 | 26.6 |
YOLOv7[ | - | 36.5 | 104.7 | 44.3 | 33.9 | 53.7 |
YOLOv7-X | - | 70.8 | 188.1 | 48.6 | 37.5 | 22.3 |
YOLOv8-S[ | - | 11.2 | 28.6 | 47.3 | 38.3 | 61.3 |
YOLOv8-M | - | 26.9 | 79.1 | 48.2 | 39.2 | 46.2 |
YOLOv8-L | - | 43.7 | 165.1 | 49.6 | 41.1 | 31.4 |
YOLOv9-S[ | - | 7.1 | 26.2 | 46.3 | 37.4 | 28.0 |
YOLOv9-M | - | 20.1 | 76.9 | 50.1 | 37.3 | 38.1 |
GOLD-YOLO-S[ | - | 21.3 | 46.1 | 44.1 | 34.4 | 55.3 |
GOLD-YOLO-M | - | 41.2 | 87.3 | 46.2 | 35.9 | 37.4 |
Deformable-DETR[ | R50 | 39.8 | 172.9 | 45.0 | 34.6 | - |
DINO[ | R50 | 47.2 | 279.0 | 44.1 | 34.4 | 6.4 |
DAB-DETR[ | R50 | 35.2 | 210.0 | 46.9 | 37.9 | - |
Conditional-DETR[ | R50 | 44.0 | 86.3 | 45.3 | 33.9 | - |
RT-DETR | R18 | 20.2 | 58.8 | 49.3 | 40.5 | 40.2 |
RT-DETR | R34 | 31.4 | 88.6 | 50.1 | 41.9 | 33.1 |
RT-DETR | R50 | 40.3 | 134.8 | 50.8 | 42.8 | 29.8 |
SD-DETR | FasterNet | 10.9 | 19.3 | 53.8 | 43.3 | 59.1 |
1 | Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3): 209-49. |
2 | 付学锋, 王美燕, 陈筱筱. 皮肤镜在颜面部皮肤肿瘤筛检中的应用效果观察[J]. 中国现代医生, 2018, 56(21): 86-8, 92. |
3 | Li XY, Wang LX, Zhang L, et al. Application of multimodal and molecular imaging techniques in the detection of choroidal melanomas[J]. Front Oncol, 2021, 10: 617868. |
4 | Argenziano G, Catricalà C, Ardigo M, et al. Seven-point checklist of dermoscopy revisited[J]. Br J Dermatol, 2011, 164(4): 785-90. |
5 | Ganster H, Pinz A, Röhrer R, et al. Automated melanoma recognition[J]. IEEE Trans Med Imaging, 2001, 20(3): 233-9. |
6 | Rana M, Bhushan M. Machine learning and deep learning approach for medical image analysis: diagnosis to detection[J]. Multimed Tools Appl, 2022: 1-39. |
7 | 邵 虹, 张鸣坤, 崔文成. 基于分层卷积神经网络的皮肤镜图像分类方法[J]. 智能科学与技术学报, 2021, 3(4): 474-81. |
8 | 郑顺源, 胡良校, 吕晓倩, 等. 基于边缘引导的自校正皮肤检测[J]. 计算机科学, 2022, 49(11): 141-7. |
9 | Huang HY, Hsiao YP, Mukundan A, et al. Classification of skin cancer using novel hyperspectral imaging engineering via YOLOv5[J]. J Clin Med, 2023, 12(3): 1134. |
10 | 沈 鑫, 魏利胜. 基于注意力残差U-Net的皮肤镜图像分割方法[J]. 智能系统学报, 2023, 18(4): 699-707. |
11 | 王玉峰, 成昊沅, 万承北, 等. 一种基于双分支注意力神经网络的皮肤癌检测框架[J]. 中国生物医学工程学报, 2024, 43(2): 153-61. |
12 | 高 埂, 肖风丽, 杨 飞. 基于改进MobileNetV3-Small的色素减退性皮肤病诊断[J]. 计算机与现代化, 2024(5): 120-6. |
13 | Zhao YA, Lv WY, Xu SL, et al. DETRs beat YOLOs on real-time object detection[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 16-22, 2024, Seattle, WA, USA. IEEE, 2024: 16965-74. |
14 | Li D, Han T, Zhou HT, et al. Lightweight Siamese network for visual tracking via FasterNet and feature adaptive fusion[C]//2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT). March 29-31, 2024, Nanjing, China. IEEE, 2024: 1-5. |
15 | Hu S, Gao F, Zhou XW, et al. Hybrid convolutional and attention network for hyperspectral image denoising[J]. IEEE Geosci Remote Sens Lett, 2024, 21: 5504005. |
16 | Ding XH, Zhang YY, Ge YX, et al. UniRepLKNet: a universal perception large-kernel ConvNet for audio, video, point cloud, time-series and image recognition[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 16-22, 2024, Seattle, WA, USA. IEEE, 2024: 5513-24. |
17 | Chen YF, Zhang CY, Chen B, et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases[J]. Comput Biol Med, 2024, 170: 107917. |
18 | Zhang H, Xu C, Zhang SJ. Inner-IoU: more effective intersection over union loss with auxiliary bounding box[EB/OL]. 2023: 2311.02877. . |
19 | Zhang YF, Ren WQ, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-57. |
20 | Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions[J]. Sci Data, 2018, 5: 180161. |
21 | Ding XH, Zhang XY, Ma NN, et al. RepVGG: making VGG-style ConvNets great again[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021: 13728-37. |
22 | Zheng ZH, Wang P, Liu W, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proc AAAI Conf Artif Intell, 2020, 34(7): 12993-3000. |
23 | Gevorgyan Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. 2022: 2205.12740. . |
24 | Ren SQ, He KM, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137-49. |
25 | Wang CY, Bochkovskiy A, Liao HM. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 7464-75. |
26 | Reis D, Kupec J, Hong J,et al. Real-Time Flying Object Detection with YOLOv8[J].ArXiv, 2023, abs/2305.09972.DOI:10.48550/arXiv.2305.09972 . |
27 | Wang CY, Yeh IH, Mark Liao HY. YOLOv9: learning what you want to learn using programmable gradient information[M]//Computer Vision – ECCV 2024. Cham: Springer Nature Switzerland, 2024: 1-21. |
28 | Wang CC, He W, Nie Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[EB/OL]. 2023: 2309.11331. . |
29 | Zhu XZ, Su WJ, Lu LW, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. 2020: 2010.04159. . |
30 | Zhang H, Li F, Liu SL, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. 2022: 2203.03605. . |
31 | Liu SL, Li F, Zhang H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. 2022: 2201.12329. . |
32 | Meng DP, Chen XK, Fan ZJ, et al. Conditional DETR for fast training convergence[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 3631-40. |
33 | Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. Int J Comput Vis, 2020, 128(2): 336-59. |
34 | He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 770-8. |
[1] | 方威扬, 肖慧, 王爽, 林晓明, 陈超敏. 基于MRI影像和临床参数特征融合的深度学习模型预测术前肝细胞癌的细胞角蛋白19状态[J]. 南方医科大学学报, 2024, 44(9): 1738-1751. |
[2] | 巩 高, 曹 石, 肖 慧, 方威扬, 阙与清, 刘子蔚, 陈超敏. 深度注意力机制结合临床特征预测肝细胞癌微血管浸润[J]. 南方医科大学学报, 2023, 43(5): 839-851. |
[3] | 吴雪扬, 张 煜, 张 华, 钟 涛. 基于注意力机制和多模态特征融合的猕猴脑磁共振图像全脑分割[J]. 南方医科大学学报, 2023, 43(12): 2118-2125. |
[4] | 邹青清, 王梦虹, 陆紫箫, 赵英华, 冯前进. 基于多序列MRI的3D关系注意力网络预测HLA-B27阴性中轴性脊柱关节病[J]. 南方医科大学学报, 2023, 43(11): 1955-1964. |
[5] | 钟友闻, 车文刚, 高盛祥. 轻型多尺度黑色素瘤目标检测网络模型的建立:基于注意力机制调控[J]. 南方医科大学学报, 2022, 42(11): 1662-1671. |
[6] | 张晓玥, 王永雄, 张佳鹏, 孙洪鑫, 王 东, 陈 羽, 周 志. 基于激活层前置压缩激励残差网络的早期胃癌筛查算法[J]. 南方医科大学学报, 2021, 41(11): 1616-1622. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||