TY - GEN
T1 - Retrieval Augmented Generation with Synergizing Reasoning and Acting Prompt Engineering for Indonesian Open-Domain Question Answering
AU - Tampubolon, Andrew Lomaksan Manuel
AU - Anggraini, Ratih Nur Esti
AU - Hidayati, Shintami Chusnul
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Large Language Models (LLMs) offer promising solutions for various downstream tasks. However, hallucination, limited reasoning capabilities, and high resource consumption remain significant challenges. Retrieval-Augmented Generation (RAG) addresses these issues by providing factual context from external sources. A notable application of RAG is in open-domain question answering, which can be implemented in multiple languages, including Bahasa Indonesia. This study investigates RAG in terms of both its retrieval and generation components. A grid search was conducted to compare sparse and dense retrieval methods. SahabatAI-Gemma-9B and its base model, Gemma-2-9B, are utilized as generative backbones. Additionally, the Synergizing Reasoning and Acting (ReAct) method is explored as a prompt engineering technique to enhance the reasoning capabilities of LLMs. The evaluation compares standard prompting with the ReAct approach. The experimental results show that BM25 is the most effective retriever among the tested methods. SahabatAI-Gemma-9B does not significantly outperform its base model. However, incorporating ReAct improves the recall of METEOR and BERTScore by approximately 2% and enables the generation of more relevant answers through structured reasoning. ReAct facilitates a step-by-step reasoning process, allowing the model to locate specific entities more effectively and to determine when to answer or express uncertainty, thus enhancing control over its responses.
AB - Large Language Models (LLMs) offer promising solutions for various downstream tasks. However, hallucination, limited reasoning capabilities, and high resource consumption remain significant challenges. Retrieval-Augmented Generation (RAG) addresses these issues by providing factual context from external sources. A notable application of RAG is in open-domain question answering, which can be implemented in multiple languages, including Bahasa Indonesia. This study investigates RAG in terms of both its retrieval and generation components. A grid search was conducted to compare sparse and dense retrieval methods. SahabatAI-Gemma-9B and its base model, Gemma-2-9B, are utilized as generative backbones. Additionally, the Synergizing Reasoning and Acting (ReAct) method is explored as a prompt engineering technique to enhance the reasoning capabilities of LLMs. The evaluation compares standard prompting with the ReAct approach. The experimental results show that BM25 is the most effective retriever among the tested methods. SahabatAI-Gemma-9B does not significantly outperform its base model. However, incorporating ReAct improves the recall of METEOR and BERTScore by approximately 2% and enables the generation of more relevant answers through structured reasoning. ReAct facilitates a step-by-step reasoning process, allowing the model to locate specific entities more effectively and to determine when to answer or express uncertainty, thus enhancing control over its responses.
KW - LLM
KW - ReAct
KW - bahasa
KW - question answering
KW - retrieval-augmented generation
UR - https://www.scopus.com/pages/publications/105018084821
U2 - 10.1109/ICoDSA67155.2025.11157319
DO - 10.1109/ICoDSA67155.2025.11157319
M3 - Conference contribution
AN - SCOPUS:105018084821
T3 - 2025 International Conference on Data Science and Its Applications, ICoDSA 2025
SP - 333
EP - 338
BT - 2025 International Conference on Data Science and Its Applications, ICoDSA 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Data Science and Its Applications, ICoDSA 2025
Y2 - 3 July 2025 through 5 July 2025
ER -