Journal of Southern Medical University ›› 2025, Vol. 45 ›› Issue (6): 1343-1352.doi: 10.12122/j.issn.1673-4254.2025.06.24

Previous Articles    

CRAKUT:integrating contrastive regional attention and clinical prior knowledge in U-transformer for radiology report generation

Yedong LIANG1(), Xiongfeng ZHU2, Meiyan HUANG1, Wencong ZHANG1, Hanyu GUO1, Qianjin FENG1()   

  1. 1.School of Biomedical Engineering, Southern Medical University//Guangdong Provincial Key Laboratory of Medical Image Processing, Guangzhou 510515, China
    2.School of Biomedical Engineering, Guangdong Medical University, Dongguan 523808, China
  • Received:2025-01-07 Online:2025-06-20 Published:2025-06-27
  • Contact: Qianjin FENG E-mail:liangyedongsmu@163.com;fengqj99@smu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(12126603)

Abstract:

Objective We propose a Contrastive Regional Attention and Prior Knowledge-Infused U-Transformer model (CRAKUT) to address the challenges of imbalanced text distribution, lack of contextual clinical knowledge, and cross-modal information transformation to enhance the quality of generated radiology reports. Methods The CRAKUT model comprises 3 key components, including an image encoder that utilizes common normal images from the dataset for extracting enhanced visual features, an external knowledge infuser that incorporates clinical prior knowledge, and a U-Transformer that facilitates cross-modal information conversion from vision to language. The contrastive regional attention in the image encoder was introduced to enhance the features of abnormal regions by emphasizing the difference between normal and abnormal semantic features. Additionally, the clinical prior knowledge infuser within the text encoder integrates clinical history and knowledge graphs generated by ChatGPT. Finally, the U-Transformer was utilized to connect the multi-modal encoder and the report decoder in a U-connection schema, and multiple types of information were used to fuse and obtain the final report. Results We evaluated the proposed CRAKUT model on two publicly available CXR datasets (IU-Xray and MIMIC-CXR). The experimental results showed that the CRAKUT model achieved a state-of-the-art performance on report generation with a BLEU-4 score of 0.159, a ROUGE-L score of 0.353, and a CIDEr score of 0.500 in MIMIC-CXR dataset; the model also had a METEOR score of 0.258 in IU-Xray dataset, outperforming all the comparison models. Conclusion The proposed method has great potential for application in clinical disease diagnoses and report generation.

Key words: Chest X-ray, contrastive region attention, clinical prior knowledge, cross-modal, U-Transformer model