TY - JOUR
T1 - Two-Phase Stratified Random Forest for Paddy Growth Phase Classification
T2 - A Case of Imbalanced Data
AU - Suryono, Hady
AU - Kuswanto, Heri
AU - Iriawan, Nur
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/11
Y1 - 2022/11
N2 - The United Nations Sustainable Development Goals (SDGs) have had a considerable impact on Indonesia’s national development policies for the period 2015 to 2030. The agricultural industry is one of the world’s most important industries, and it is critical to the achievement of the SDGs. The second major aspect of the SDGs, i.e., zero hunger, addresses food security (SDG 2). To measure the status of food security, accurate statistics on paddy production must be accessible. Paddy phenological classification is a way to determine a food plant’s growth phase. Imbalanced data are a common occurrence in agricultural data, and machine learning is frequently utilized as a technique for classification issues. The current trend in agriculture is to use remote sensing data to classify crops. This paper proposes a new approach—one that uses two phases in the bootstrap stage of the random forest method—called a two-phase stratified random forest (TPSRF). The simulation scenario shows that the proposed TPSRF outperforms CART, SVM, and RF. Furthermore, in its application to paddy growth phase data for 2019 in Lamongan Regency, East Java, Indonesia, the proposed TPSRF showed higher overall accuracy (OA) than the compared methods.
AB - The United Nations Sustainable Development Goals (SDGs) have had a considerable impact on Indonesia’s national development policies for the period 2015 to 2030. The agricultural industry is one of the world’s most important industries, and it is critical to the achievement of the SDGs. The second major aspect of the SDGs, i.e., zero hunger, addresses food security (SDG 2). To measure the status of food security, accurate statistics on paddy production must be accessible. Paddy phenological classification is a way to determine a food plant’s growth phase. Imbalanced data are a common occurrence in agricultural data, and machine learning is frequently utilized as a technique for classification issues. The current trend in agriculture is to use remote sensing data to classify crops. This paper proposes a new approach—one that uses two phases in the bootstrap stage of the random forest method—called a two-phase stratified random forest (TPSRF). The simulation scenario shows that the proposed TPSRF outperforms CART, SVM, and RF. Furthermore, in its application to paddy growth phase data for 2019 in Lamongan Regency, East Java, Indonesia, the proposed TPSRF showed higher overall accuracy (OA) than the compared methods.
KW - classification
KW - data imbalance
KW - paddy phenology
KW - sustainable development goals
KW - two-phase stratified random forest
UR - http://www.scopus.com/inward/record.url?scp=85142713743&partnerID=8YFLogxK
U2 - 10.3390/su142215252
DO - 10.3390/su142215252
M3 - Article
AN - SCOPUS:85142713743
SN - 2071-1050
VL - 14
JO - Sustainability (Switzerland)
JF - Sustainability (Switzerland)
IS - 22
M1 - 15252
ER -