Journal of Southern Medical University ›› 2025, Vol. 45 ›› Issue (2): 409-421.doi: 10.12122/j.issn.1673-4254.2025.02.22
Yuying REN(), Lingxiao HUANG(
), Fang DU, Xinbo YAO
Received:
2024-10-30
Online:
2025-02-20
Published:
2025-03-03
Contact:
Lingxiao HUANG
E-mail:ran96822@stu.nxu.edu.cn;huanglx@nxu.edu.cn
Supported by:
Yuying REN, Lingxiao HUANG, Fang DU, Xinbo YAO. An efficient and lightweight skin pathology detection method based on multi-scale feature fusion using an improved RT-DETR model[J]. Journal of Southern Medical University, 2025, 45(2): 409-421.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.j-smu.com/EN/10.12122/j.issn.1673-4254.2025.02.22
Label Name | Mark the quantity of medical equipment | Number of images |
---|---|---|
MEL | 1123 | 1113 |
NV | 6761 | 6705 |
BCC | 517 | 514 |
AKIEC | 334 | 327 |
BKL | 1124 | 1099 |
DF | 116 | 115 |
VASC | 142 | 142 |
Tab.1 Dataset statistical information
Label Name | Mark the quantity of medical equipment | Number of images |
---|---|---|
MEL | 1123 | 1113 |
NV | 6761 | 6705 |
BCC | 517 | 514 |
AKIEC | 334 | 327 |
BKL | 1124 | 1099 |
DF | 116 | 115 |
VASC | 142 | 142 |
Layer number and name | Output size | Layer structure |
---|---|---|
0-Embeddding | 160×160 | Conv 4×4, 40 |
1-Stage 1 | 160×160 | FasterNetRepBlock, 40 |
2-Merging | 80×80 | Conv 2×2, 80 |
3-Stage 2 | 80×80 | FasterNetRepBlock(Repeat 2 times), 80 |
4-Merging | 40×40 | Conv 2×2, 160, |
5-Stage 3 | 40×40 | FasterNetRepBlock(Repeat 8 times), 160 |
6-Merging | 20×20 | Conv 2×2, 320 |
7-Stage 4 | 20×20 | FasterNetRepBlock(Repeat 2 times), 320 |
Tab.2 Main structure of the FasterNet backbone network
Layer number and name | Output size | Layer structure |
---|---|---|
0-Embeddding | 160×160 | Conv 4×4, 40 |
1-Stage 1 | 160×160 | FasterNetRepBlock, 40 |
2-Merging | 80×80 | Conv 2×2, 80 |
3-Stage 2 | 80×80 | FasterNetRepBlock(Repeat 2 times), 80 |
4-Merging | 40×40 | Conv 2×2, 160, |
5-Stage 3 | 40×40 | FasterNetRepBlock(Repeat 8 times), 160 |
6-Merging | 20×20 | Conv 2×2, 320 |
7-Stage 4 | 20×20 | FasterNetRepBlock(Repeat 2 times), 320 |
Configuration name | Model version |
---|---|
System environment | Ubuntu20.04.6 |
CPU | Intel(R) Xeon(R) Gold 5418Y |
GPU | NVIDIA GeForce RTX 4090 24GB |
CUDA | CUDA 11.3 |
RAM | 32 GB |
Deep learning framework | Pytorch 1.13.0 |
Tab.3 Experimental environment configuration
Configuration name | Model version |
---|---|
System environment | Ubuntu20.04.6 |
CPU | Intel(R) Xeon(R) Gold 5418Y |
GPU | NVIDIA GeForce RTX 4090 24GB |
CUDA | CUDA 11.3 |
RAM | 32 GB |
Deep learning framework | Pytorch 1.13.0 |
Experiment Number | Reparameterized FasterNet | AIFI-CAFM | DRB-HSFPN | mAP50(%) | Param(M) | FLOPs/G |
---|---|---|---|---|---|---|
1 | 49.3 | 20.2 | 58.8 | |||
2 | √ | 50.7 | 11.1 | 29.9 | ||
3 | √ | 51.6 | 20.9 | 59.7 | ||
4 | √ | 49.6 | 17.3 | 45.6 | ||
5 | √ | √ | 52.7 | 13.7 | 32.3 | |
6 | √ | √ | √ | 53.8 | 10.9 | 19.3 |
Tab.4 Ablation experiment results
Experiment Number | Reparameterized FasterNet | AIFI-CAFM | DRB-HSFPN | mAP50(%) | Param(M) | FLOPs/G |
---|---|---|---|---|---|---|
1 | 49.3 | 20.2 | 58.8 | |||
2 | √ | 50.7 | 11.1 | 29.9 | ||
3 | √ | 51.6 | 20.9 | 59.7 | ||
4 | √ | 49.6 | 17.3 | 45.6 | ||
5 | √ | √ | 52.7 | 13.7 | 32.3 | |
6 | √ | √ | √ | 53.8 | 10.9 | 19.3 |
Loss | P | R | mAP | mAP |
---|---|---|---|---|
GIoU | 69.7 | 53.2 | 52.8 | 42.5 |
CIoU | 62.1 | 51.9 | 52.4 | 42.2 |
DIoU | 68.7 | 54.5 | 52.7 | 42.7 |
SIoU | 69.5 | 54.2 | 53.1 | 43.0 |
EIoU | 66.7 | 57.3 | 53.3 | 43.1 |
Inner-EIoU | 71.9 | 55.2 | 53.8 | 43.3 |
Tab.5 Comparison of experimental results by introducing different loss functions (%)
Loss | P | R | mAP | mAP |
---|---|---|---|---|
GIoU | 69.7 | 53.2 | 52.8 | 42.5 |
CIoU | 62.1 | 51.9 | 52.4 | 42.2 |
DIoU | 68.7 | 54.5 | 52.7 | 42.7 |
SIoU | 69.5 | 54.2 | 53.1 | 43.0 |
EIoU | 66.7 | 57.3 | 53.3 | 43.1 |
Inner-EIoU | 71.9 | 55.2 | 53.8 | 43.3 |
Category | P | P | R | R | mAP | mAP |
---|---|---|---|---|---|---|
All | 64.6 | 71.9 | 53.4 | 55.2 | 49.3 | 53.8 |
MEL | 70.0 | 70.9 | 35.3 | 41.8 | 34.6 | 47.3 |
NV | 79.2 | 79.6 | 92.6 | 85.6 | 86.8 | 83.2 |
BCC | 52.6 | 68.1 | 58.5 | 58.8 | 42.3 | 51.8 |
AKIEC | 61.4 | 65.9 | 39.9 | 50.0 | 40.5 | 35.6 |
BKL | 41.5 | 67.3 | 45.9 | 50.2 | 31.6 | 47.4 |
DF | 59.9 | 75.5 | 59.6 | 57.1 | 54.7 | 64.1 |
VASC | 67.3 | 75.9 | 40.7 | 42.9 | 43.6 | 47.0 |
Tab.6 Comparison of model performance across various categories on HAM10000 before and after improvements
Category | P | P | R | R | mAP | mAP |
---|---|---|---|---|---|---|
All | 64.6 | 71.9 | 53.4 | 55.2 | 49.3 | 53.8 |
MEL | 70.0 | 70.9 | 35.3 | 41.8 | 34.6 | 47.3 |
NV | 79.2 | 79.6 | 92.6 | 85.6 | 86.8 | 83.2 |
BCC | 52.6 | 68.1 | 58.5 | 58.8 | 42.3 | 51.8 |
AKIEC | 61.4 | 65.9 | 39.9 | 50.0 | 40.5 | 35.6 |
BKL | 41.5 | 67.3 | 45.9 | 50.2 | 31.6 | 47.4 |
DF | 59.9 | 75.5 | 59.6 | 57.1 | 54.7 | 64.1 |
VASC | 67.3 | 75.9 | 40.7 | 42.9 | 43.6 | 47.0 |
Model | Backbone | Param (M) | FLOPs (G) | mAP | mAP | FPS |
---|---|---|---|---|---|---|
Faster-RCNN[ | R50 | 137.1 | 370.2 | 39.3 | 25.5 | 26.6 |
YOLOv7[ | - | 36.5 | 104.7 | 44.3 | 33.9 | 53.7 |
YOLOv7-X | - | 70.8 | 188.1 | 48.6 | 37.5 | 22.3 |
YOLOv8-S[ | - | 11.2 | 28.6 | 47.3 | 38.3 | 61.3 |
YOLOv8-M | - | 26.9 | 79.1 | 48.2 | 39.2 | 46.2 |
YOLOv8-L | - | 43.7 | 165.1 | 49.6 | 41.1 | 31.4 |
YOLOv9-S[ | - | 7.1 | 26.2 | 46.3 | 37.4 | 28.0 |
YOLOv9-M | - | 20.1 | 76.9 | 50.1 | 37.3 | 38.1 |
GOLD-YOLO-S[ | - | 21.3 | 46.1 | 44.1 | 34.4 | 55.3 |
GOLD-YOLO-M | - | 41.2 | 87.3 | 46.2 | 35.9 | 37.4 |
Deformable-DETR[ | R50 | 39.8 | 172.9 | 45.0 | 34.6 | - |
DINO[ | R50 | 47.2 | 279.0 | 44.1 | 34.4 | 6.4 |
DAB-DETR[ | R50 | 35.2 | 210.0 | 46.9 | 37.9 | - |
Conditional-DETR[ | R50 | 44.0 | 86.3 | 45.3 | 33.9 | - |
RT-DETR | R18 | 20.2 | 58.8 | 49.3 | 40.5 | 40.2 |
RT-DETR | R34 | 31.4 | 88.6 | 50.1 | 41.9 | 33.1 |
RT-DETR | R50 | 40.3 | 134.8 | 50.8 | 42.8 | 29.8 |
SD-DETR | FasterNet | 10.9 | 19.3 | 53.8 | 43.3 | 59.1 |
Tab.7 Performance comparison of different models
Model | Backbone | Param (M) | FLOPs (G) | mAP | mAP | FPS |
---|---|---|---|---|---|---|
Faster-RCNN[ | R50 | 137.1 | 370.2 | 39.3 | 25.5 | 26.6 |
YOLOv7[ | - | 36.5 | 104.7 | 44.3 | 33.9 | 53.7 |
YOLOv7-X | - | 70.8 | 188.1 | 48.6 | 37.5 | 22.3 |
YOLOv8-S[ | - | 11.2 | 28.6 | 47.3 | 38.3 | 61.3 |
YOLOv8-M | - | 26.9 | 79.1 | 48.2 | 39.2 | 46.2 |
YOLOv8-L | - | 43.7 | 165.1 | 49.6 | 41.1 | 31.4 |
YOLOv9-S[ | - | 7.1 | 26.2 | 46.3 | 37.4 | 28.0 |
YOLOv9-M | - | 20.1 | 76.9 | 50.1 | 37.3 | 38.1 |
GOLD-YOLO-S[ | - | 21.3 | 46.1 | 44.1 | 34.4 | 55.3 |
GOLD-YOLO-M | - | 41.2 | 87.3 | 46.2 | 35.9 | 37.4 |
Deformable-DETR[ | R50 | 39.8 | 172.9 | 45.0 | 34.6 | - |
DINO[ | R50 | 47.2 | 279.0 | 44.1 | 34.4 | 6.4 |
DAB-DETR[ | R50 | 35.2 | 210.0 | 46.9 | 37.9 | - |
Conditional-DETR[ | R50 | 44.0 | 86.3 | 45.3 | 33.9 | - |
RT-DETR | R18 | 20.2 | 58.8 | 49.3 | 40.5 | 40.2 |
RT-DETR | R34 | 31.4 | 88.6 | 50.1 | 41.9 | 33.1 |
RT-DETR | R50 | 40.3 | 134.8 | 50.8 | 42.8 | 29.8 |
SD-DETR | FasterNet | 10.9 | 19.3 | 53.8 | 43.3 | 59.1 |
1 | Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3): 209-49. |
2 | 付学锋, 王美燕, 陈筱筱. 皮肤镜在颜面部皮肤肿瘤筛检中的应用效果观察[J]. 中国现代医生, 2018, 56(21): 86-8, 92. |
3 | Li XY, Wang LX, Zhang L, et al. Application of multimodal and molecular imaging techniques in the detection of choroidal melanomas[J]. Front Oncol, 2021, 10: 617868. |
4 | Argenziano G, Catricalà C, Ardigo M, et al. Seven-point checklist of dermoscopy revisited[J]. Br J Dermatol, 2011, 164(4): 785-90. |
5 | Ganster H, Pinz A, Röhrer R, et al. Automated melanoma recognition[J]. IEEE Trans Med Imaging, 2001, 20(3): 233-9. |
6 | Rana M, Bhushan M. Machine learning and deep learning approach for medical image analysis: diagnosis to detection[J]. Multimed Tools Appl, 2022: 1-39. |
7 | 邵 虹, 张鸣坤, 崔文成. 基于分层卷积神经网络的皮肤镜图像分类方法[J]. 智能科学与技术学报, 2021, 3(4): 474-81. |
8 | 郑顺源, 胡良校, 吕晓倩, 等. 基于边缘引导的自校正皮肤检测[J]. 计算机科学, 2022, 49(11): 141-7. |
9 | Huang HY, Hsiao YP, Mukundan A, et al. Classification of skin cancer using novel hyperspectral imaging engineering via YOLOv5[J]. J Clin Med, 2023, 12(3): 1134. |
10 | 沈 鑫, 魏利胜. 基于注意力残差U-Net的皮肤镜图像分割方法[J]. 智能系统学报, 2023, 18(4): 699-707. |
11 | 王玉峰, 成昊沅, 万承北, 等. 一种基于双分支注意力神经网络的皮肤癌检测框架[J]. 中国生物医学工程学报, 2024, 43(2): 153-61. |
12 | 高 埂, 肖风丽, 杨 飞. 基于改进MobileNetV3-Small的色素减退性皮肤病诊断[J]. 计算机与现代化, 2024(5): 120-6. |
13 | Zhao YA, Lv WY, Xu SL, et al. DETRs beat YOLOs on real-time object detection[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 16-22, 2024, Seattle, WA, USA. IEEE, 2024: 16965-74. |
14 | Li D, Han T, Zhou HT, et al. Lightweight Siamese network for visual tracking via FasterNet and feature adaptive fusion[C]//2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT). March 29-31, 2024, Nanjing, China. IEEE, 2024: 1-5. |
15 | Hu S, Gao F, Zhou XW, et al. Hybrid convolutional and attention network for hyperspectral image denoising[J]. IEEE Geosci Remote Sens Lett, 2024, 21: 5504005. |
16 | Ding XH, Zhang YY, Ge YX, et al. UniRepLKNet: a universal perception large-kernel ConvNet for audio, video, point cloud, time-series and image recognition[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 16-22, 2024, Seattle, WA, USA. IEEE, 2024: 5513-24. |
17 | Chen YF, Zhang CY, Chen B, et al. Accurate leukocyte detection based on deformable-DETR and multi-level feature fusion for aiding diagnosis of blood diseases[J]. Comput Biol Med, 2024, 170: 107917. |
18 | Zhang H, Xu C, Zhang SJ. Inner-IoU: more effective intersection over union loss with auxiliary bounding box[EB/OL]. 2023: 2311.02877. . |
19 | Zhang YF, Ren WQ, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-57. |
20 | Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions[J]. Sci Data, 2018, 5: 180161. |
21 | Ding XH, Zhang XY, Ma NN, et al. RepVGG: making VGG-style ConvNets great again[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021: 13728-37. |
22 | Zheng ZH, Wang P, Liu W, et al. Distance-IoU loss: faster and better learning for bounding box regression[J]. Proc AAAI Conf Artif Intell, 2020, 34(7): 12993-3000. |
23 | Gevorgyan Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. 2022: 2205.12740. . |
24 | Ren SQ, He KM, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Trans Pattern Anal Mach Intell, 2017, 39(6): 1137-49. |
25 | Wang CY, Bochkovskiy A, Liao HM. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 17-24, 2023, Vancouver, BC, Canada. IEEE, 2023: 7464-75. |
26 | Reis D, Kupec J, Hong J,et al. Real-Time Flying Object Detection with YOLOv8[J].ArXiv, 2023, abs/2305.09972.DOI:10.48550/arXiv.2305.09972 . |
27 | Wang CY, Yeh IH, Mark Liao HY. YOLOv9: learning what you want to learn using programmable gradient information[M]//Computer Vision – ECCV 2024. Cham: Springer Nature Switzerland, 2024: 1-21. |
28 | Wang CC, He W, Nie Y, et al. Gold-YOLO: efficient object detector via gather-and-distribute mechanism[EB/OL]. 2023: 2309.11331. . |
29 | Zhu XZ, Su WJ, Lu LW, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. 2020: 2010.04159. . |
30 | Zhang H, Li F, Liu SL, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. 2022: 2203.03605. . |
31 | Liu SL, Li F, Zhang H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. 2022: 2201.12329. . |
32 | Meng DP, Chen XK, Fan ZJ, et al. Conditional DETR for fast training convergence[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 3631-40. |
33 | Selvaraju RR, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. Int J Comput Vis, 2020, 128(2): 336-59. |
34 | He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 770-8. |
[1] | Peishan ZHOU, Wei YANG, Qingyuan LI, Xiaofang GUO, Rong FU, Side LIU. A fusion model of manually extracted visual features and deep learning features for rebleeding risk stratification in peptic ulcers [J]. Journal of Southern Medical University, 2025, 45(1): 197-205. |
[2] | GONG Gao, CAO Shi, XIAO Hui, FANG Weiyang, QUE Yuqing, LIU Ziwei, CHEN Chaomin. Prediction of microvascular invasion in hepatocellular carcinoma with magnetic resonance imaging using models combining deep attention mechanism with clinical features [J]. Journal of Southern Medical University, 2023, 43(5): 839-851. |
[3] | WU Xueyang, ZHANG Yu, ZHANG Hua, ZHONG Tao. Whole-brain parcellation for macaque brain magnetic resonance images based on attention mechanism and multi-modality feature fusion [J]. Journal of Southern Medical University, 2023, 43(12): 2118-2125. |
[4] | ZHONG Youwen, CHE Wengang, GAO Shengxiang. A lightweight multiscale target object detection network for melanoma based on attention mechanism manipulation [J]. Journal of Southern Medical University, 2022, 42(11): 1662-1671. |
[5] | ZHANG Xiaoyue, WANG Yongxiong, ZHANG Jiapeng, SUN Hongxin, WANG Dong, CHEN Yu, ZHOU Zhi. Screening of early gastric cancer using Pre-Activation Squeeze-and-Excitation ResNet [J]. Journal of Southern Medical University, 2021, 41(11): 1616-1622. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||