Abstract

Image captioning is one of the challenging tasks that cross the computer vision and the Natural Language Processing (NLP) domain. Its main task is to interpret images in a descriptive text similar to humans. Image captioning is useful to help humans understand visual content. The main challenge is to get a coherent caption that could be understood by a human. With the trend of Transformer in computer vision that has proven successful to reach new results in state-of-the-art, the interest to implement it in Image Captioning is also increased. This paper presents a literature review of image captioning using transformer methods. The literature is reviewed from reputable journals and conferences. Our review focus on transformer approaches in order to improve the model performance in image captioning. We also explore the existing public datasets that are used in image captioning. The limitations and future research on image captioning are also discussed with additional potential subsidiary research.

Original languageEnglish
Title of host publicationICITEE 2022 - Proceedings of the 14th International Conference on Information Technology and Electrical Engineering
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages280-285
Number of pages6
ISBN (Electronic)9781665460774
DOIs
Publication statusPublished - 2022
Event14th International Conference on Information Technology and Electrical Engineering, ICITEE 2022 - Yogyakarta, Indonesia
Duration: 18 Oct 202219 Oct 2022

Publication series

NameICITEE 2022 - Proceedings of the 14th International Conference on Information Technology and Electrical Engineering

Conference

Conference14th International Conference on Information Technology and Electrical Engineering, ICITEE 2022
Country/TerritoryIndonesia
CityYogyakarta
Period18/10/2219/10/22

Keywords

  • Attention Mechanism
  • Automatic Captioning
  • Image Captioning
  • Literature Review
  • Transformer

Fingerprint

Dive into the research topics of 'Transformer Approaches in Image Captioning: A Literature Review'. Together they form a unique fingerprint.

Cite this