Feature Selection and Sensitivity Analysis of Oversampling in Big and Highly Imbalanced Bank's Credit Data

Aznovri Kurniawan, Ahmad Rifa'i, Moch Abdillah Nafis, Nimas Sefrida Andriaswuri, Harry Patria, Diana Purwitasari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Machine learning has evolved as a multidisciplinary study in the last few years and gains more popularity in big data analytics, including in the banking industry. Numerous methods can be used in predictive analytics through supervised machine learning, either for regression or classification problems. In the banking industry, credit quality is one of the core focuses, since it is one of the main areas that is reviewed regularly by regulators and impacts banks' profitability. This research is intended to give recommendations on how to select appropriate machine learning technique, perform feature selection and sensitivity analysis on bank's credit data with more than one million records and highly imbalanced, i.e., 97.5% of data is at one category. By using several supervised machine learning classification methods including the application of SMOTE (synthetic minority oversampling technique), computational results are compared and summarized, resulting in recommendations on the most appropriate technique for big and extremely imbalanced datasets, i.e., the Tree Ensemble method with SMOTE, with the computational issue is solved through data sampling, without significantly reducing its accuracy. It is also concluded that optimum number of features will increase model accuracy, however significant reduction of number of features will not necessarily increase model accuracy. The research is expected to be useful for the banking industry, especially in credit portfolio analytics, or other industries with a big and imbalanced dataset, to perform predictive analytics to support business objectives. Further research is possible, to cover more in-depth analytics for the decision-making process in banking.

Original languageEnglish
Title of host publication2022 10th International Conference on Information and Communication Technology, ICoICT 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages35-40
Number of pages6
ISBN (Electronic)9781665481656
DOIs
Publication statusPublished - 2022
Event10th International Conference on Information and Communication Technology, ICoICT 2022 - Virtual, Online, Indonesia
Duration: 2 Aug 20223 Aug 2022

Publication series

Name2022 10th International Conference on Information and Communication Technology, ICoICT 2022

Conference

Conference10th International Conference on Information and Communication Technology, ICoICT 2022
Country/TerritoryIndonesia
CityVirtual, Online
Period2/08/223/08/22

Keywords

  • credit data
  • feature selection
  • imbalanced dataset
  • machine learning
  • sensitivity analysis

Fingerprint

Dive into the research topics of 'Feature Selection and Sensitivity Analysis of Oversampling in Big and Highly Imbalanced Bank's Credit Data'. Together they form a unique fingerprint.

Cite this