Augmented Data Low Confidence (ADLC): A Confidence-Driven Data Augmentation Framework with Ensemble Optimization for Enhanced Machine Learning Performance

Research output: Contribution to journalArticlepeer-review

Abstract

The application of machine learning in data-driven solutions has matured, yet efforts continue to improve predictive accuracy. This study presents a comprehensive approach that begins with data preprocessing, including the removal of invalid values, duplicate entries, feature selection, and dimensionality reduction, followed by model optimization through hyperparameter tuning. A novel method, Augmented-Data Low Confidence, is introduced to enhance model performance by augmenting samples with low prediction confidence. The k-nearest neighbors method is used to estimate prediction probabilities. Samples falling below a defined confidence threshold are selected for augmentation by generating new data points. These points are created by randomly sampling feature values within the upper and lower bounds of the low-confidence instances. The augmented dataset is then optimized using the Gray Wolf Optimization algorithm, which adjusts model parameters based on an accuracy-driven fitness function. Experiments on ten public datasets and two proprietary datasets show that this feedback-based augmentation consistently improves the accuracy of various machine learning models. The results demonstrate the effectiveness of incorporating uncertain predictions into the learning process, leading to improved generalization and classification performance.

Original languageEnglish
Pages (from-to)201439-201459
Number of pages21
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • Augmentation
  • machine learning
  • performance improvement

Fingerprint

Dive into the research topics of 'Augmented Data Low Confidence (ADLC): A Confidence-Driven Data Augmentation Framework with Ensemble Optimization for Enhanced Machine Learning Performance'. Together they form a unique fingerprint.

Cite this