Multi-Task Learning Model for Medical Image Captioning and Threshold-Based Label Classification

Research output: Contribution to journalArticlepeer-review

Abstract

Manually creating medical reports is time-consuming and increases the risk of diagnostic error due to fatigue and the high workload of radiologists. Therefore, developing a medical image captioning model to assist radiologists in automatically generating medical reports is crucial for improving the accuracy and efficiency of report generation. This research proposed a multi-task learning framework that simultaneously performs multi-label classification and medical report generation. The model utilized a ResNet-152 pre-trained model for the visual encoder, a co-attention mechanism for integrating the visual and semantic features, and a hierarchical-based long-short term memory (LSTM) for the language decoder. The multi-label classification module also implements a dynamic threshold-based approach to determine the relevant disease labels dynamically. A comprehensive cross-validation experiment on the complete IU X-ray dataset and a domain-specific pulmonary subset demonstrates that the proposed model outperforms several prior methods across all evaluation metrics, including ROUGE of 0.521 and METEOR of 0.449.

Original languageEnglish
Pages (from-to)271-281
Number of pages11
JournalInternational Journal of Intelligent Engineering and Systems
Volume18
Issue number7
DOIs
Publication statusPublished - 2025

Keywords

  • Co-attention
  • Hierarchical LSTM
  • Medical image captioning
  • Multi-task learning

Fingerprint

Dive into the research topics of 'Multi-Task Learning Model for Medical Image Captioning and Threshold-Based Label Classification'. Together they form a unique fingerprint.

Cite this