Two-Phase Stratified Random Forest for Paddy Growth Phase Classification: A Case of Imbalanced Data

Hady Suryono, Heri Kuswanto*, Nur Iriawan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

The United Nations Sustainable Development Goals (SDGs) have had a considerable impact on Indonesia’s national development policies for the period 2015 to 2030. The agricultural industry is one of the world’s most important industries, and it is critical to the achievement of the SDGs. The second major aspect of the SDGs, i.e., zero hunger, addresses food security (SDG 2). To measure the status of food security, accurate statistics on paddy production must be accessible. Paddy phenological classification is a way to determine a food plant’s growth phase. Imbalanced data are a common occurrence in agricultural data, and machine learning is frequently utilized as a technique for classification issues. The current trend in agriculture is to use remote sensing data to classify crops. This paper proposes a new approach—one that uses two phases in the bootstrap stage of the random forest method—called a two-phase stratified random forest (TPSRF). The simulation scenario shows that the proposed TPSRF outperforms CART, SVM, and RF. Furthermore, in its application to paddy growth phase data for 2019 in Lamongan Regency, East Java, Indonesia, the proposed TPSRF showed higher overall accuracy (OA) than the compared methods.

Original languageEnglish
Article number15252
JournalSustainability (Switzerland)
Volume14
Issue number22
DOIs
Publication statusPublished - Nov 2022

Keywords

  • classification
  • data imbalance
  • paddy phenology
  • sustainable development goals
  • two-phase stratified random forest

Fingerprint

Dive into the research topics of 'Two-Phase Stratified Random Forest for Paddy Growth Phase Classification: A Case of Imbalanced Data'. Together they form a unique fingerprint.

Cite this