TY - GEN
T1 - Automatic Question Generation from Indonesian Texts Using Text-to-Text Transformers
AU - Fuadi, Mukhlish
AU - Wibawa, Adhi Dharma
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Answering questions is one method to increase or measure understanding. However, creating relevant and answerable questions from the given context is not easy. Automatic Question Generation (AQG) is a part of Natural Language Processing (NLP) which can generate questions automatically from text input. Many studies related to AQG have been carried out but are still very limited in Indonesian texts, especially those that use the latest Transformer variations. This study proposes an AQG system that utilizes the latest power Transformer, the multilingual Text-to-Text Transfer Transformer (mT5). We fine-tune the mT5 model to extract answers from context and generate questions based on those answers. We use the Indonesian dataset extracted from the TyDiQA dataset and evaluate this model against the TyDiQA validation set using BLEU (BiLingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics. This model achieved BLEU-1, BLEU-2, BLEU-3, BLEU-4, and ROUGE-L scores of 36.54, 28.24, 22.61, 18.44, and 39.57, respectively. Our model performs well and generates questions in understandable Indonesian with good word choice and grammar based on manual validation.
AB - Answering questions is one method to increase or measure understanding. However, creating relevant and answerable questions from the given context is not easy. Automatic Question Generation (AQG) is a part of Natural Language Processing (NLP) which can generate questions automatically from text input. Many studies related to AQG have been carried out but are still very limited in Indonesian texts, especially those that use the latest Transformer variations. This study proposes an AQG system that utilizes the latest power Transformer, the multilingual Text-to-Text Transfer Transformer (mT5). We fine-tune the mT5 model to extract answers from context and generate questions based on those answers. We use the Indonesian dataset extracted from the TyDiQA dataset and evaluate this model against the TyDiQA validation set using BLEU (BiLingual Evaluation Understudy) and ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics. This model achieved BLEU-1, BLEU-2, BLEU-3, BLEU-4, and ROUGE-L scores of 36.54, 28.24, 22.61, 18.44, and 39.57, respectively. Our model performs well and generates questions in understandable Indonesian with good word choice and grammar based on manual validation.
KW - AQG
KW - Transformer
KW - mT5
KW - question generation
UR - http://www.scopus.com/inward/record.url?scp=85144623316&partnerID=8YFLogxK
U2 - 10.1109/IEIT56384.2022.9967858
DO - 10.1109/IEIT56384.2022.9967858
M3 - Conference contribution
AN - SCOPUS:85144623316
T3 - Proceedings - IEIT 2022: 2022 International Conference on Electrical and Information Technology
SP - 84
EP - 89
BT - Proceedings - IEIT 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Electrical and Information Technology, IEIT 2022
Y2 - 15 September 2022 through 16 September 2022
ER -