Abstract
Classical statistics are usually based on parametric models, where the performance depends heavily on assumptions and is not robust in the presence of outliers in the data. Due to the COVID-19 pandemic, our daily lives have changed significantly, including slowing economic growth. These extreme changes can manifest as an outlier in time series studies and adversely affect the results of data analysis. Many classical methods of official statistics are prone to outliers. In this work, we evaluate machine learning methods: Support Vector Regression (SVR) and Random Forest (RF) and compare it with ARIMA to determine the robustness through simulation studies. Robustness is measured by the sensitivity of the SVR and Random Forest hyperparameter and the model’s error in the presence of outliers. Simulations show that more outliers lead to higher RMSE values, and conversely, more samples lead to lower RMSE values. The type of outliers significantly impacts the RMSE value of the ARIMA model, where additional outliers (AO) have a worse impact than temporary change (TC). Consecutive outliers produce a smaller RMSE mean than non-consecutive outliers. Based on the sensitivity of hyperparameters, SVR and Random Forest models are relatively robust to the presence of outliers in the data. Based on the simulation results of 100 iterations, we find that SVR is more robust than ARIMA and Random Forest in modeling time series data with outliers.
| Original language | English |
|---|---|
| Title of host publication | Lecture Notes on Data Engineering and Communications Technologies |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 465-479 |
| Number of pages | 15 |
| DOIs | |
| Publication status | Published - 2023 |
Publication series
| Name | Lecture Notes on Data Engineering and Communications Technologies |
|---|---|
| Volume | 165 |
| ISSN (Print) | 2367-4512 |
| ISSN (Electronic) | 2367-4520 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 8 Decent Work and Economic Growth
Keywords
- Outlier
- Random forest
- Robustness
- Support vector regression
Fingerprint
Dive into the research topics of 'Robustness of Support Vector Regression and Random Forest Models: A Simulation Study'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver