Abstract
Manually creating medical reports is time-consuming and increases the risk of diagnostic error due to fatigue and the high workload of radiologists. Therefore, developing a medical image captioning model to assist radiologists in automatically generating medical reports is crucial for improving the accuracy and efficiency of report generation. This research proposed a multi-task learning framework that simultaneously performs multi-label classification and medical report generation. The model utilized a ResNet-152 pre-trained model for the visual encoder, a co-attention mechanism for integrating the visual and semantic features, and a hierarchical-based long-short term memory (LSTM) for the language decoder. The multi-label classification module also implements a dynamic threshold-based approach to determine the relevant disease labels dynamically. A comprehensive cross-validation experiment on the complete IU X-ray dataset and a domain-specific pulmonary subset demonstrates that the proposed model outperforms several prior methods across all evaluation metrics, including ROUGE of 0.521 and METEOR of 0.449.
| Original language | English |
|---|---|
| Pages (from-to) | 271-281 |
| Number of pages | 11 |
| Journal | International Journal of Intelligent Engineering and Systems |
| Volume | 18 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Co-attention
- Hierarchical LSTM
- Medical image captioning
- Multi-task learning
Fingerprint
Dive into the research topics of 'Multi-Task Learning Model for Medical Image Captioning and Threshold-Based Label Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver