TY - GEN
T1 - Non-linear Reward Deep Q Networks for Smooth Action in a Car Game
AU - Iqbal, Mohammad
AU - Afandy, Achmad
AU - Hidayat, Nurul
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - We formulate non-linear reward functions on deep Q networks in a car racing game by observing the environment (simulator). We aim to control the car movement (action) more smoothly in the game simulator than in the original. Existing studies about deep reinforcement learning maintained either discrete or non-linear reward functions without considering the environment domain, which may lead to illogical car movements. For instance, the car is blocked by three other cars, yet the game still continues by jumping to one of them. To overcome the issues, we define a non-linear reward function to compute the penalty game score based on the distance between the car and the one in front of it. From the game simulator, we surprisingly enjoy the results from the proposed reward function as the car drives more accurately and smoothly than the SOTA models, even at the start of the game point, by showing the smallest number of crashes and no zigzaggy agent movement when the obstacles are far from it.
AB - We formulate non-linear reward functions on deep Q networks in a car racing game by observing the environment (simulator). We aim to control the car movement (action) more smoothly in the game simulator than in the original. Existing studies about deep reinforcement learning maintained either discrete or non-linear reward functions without considering the environment domain, which may lead to illogical car movements. For instance, the car is blocked by three other cars, yet the game still continues by jumping to one of them. To overcome the issues, we define a non-linear reward function to compute the penalty game score based on the distance between the car and the one in front of it. From the game simulator, we surprisingly enjoy the results from the proposed reward function as the car drives more accurately and smoothly than the SOTA models, even at the start of the game point, by showing the smallest number of crashes and no zigzaggy agent movement when the obstacles are far from it.
KW - Q networks
KW - Reinforcement learning
KW - Reward function
KW - Self-driving cars
UR - http://www.scopus.com/inward/record.url?scp=85200683894&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-2136-8_19
DO - 10.1007/978-981-97-2136-8_19
M3 - Conference contribution
AN - SCOPUS:85200683894
SN - 9789819721351
T3 - Springer Proceedings in Mathematics and Statistics
SP - 259
EP - 270
BT - Applied and Computational Mathematics - ICoMPAC 2023
A2 - Adzkiya, Dieky
A2 - Fahim, Kistosil
PB - Springer
T2 - 8th International Conference on Mathematics: Pure, Applied and Computation, ICoMPAC 2023
Y2 - 30 September 2023 through 30 September 2023
ER -