The Impact of Clustering-Based Sequential Multivariate Outliers Detection in Handling Missing Values

Mety Agustini, Kartika Fithriasari*, Dedy Dwi Prastyo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

The presence of missing values is a common issue that frequently leads to incomplete data in a wide range of research. They diminish the accessibility of the dataset that can be utilized and degrade the statistical power of the analysis. A significant focus in numerous studies has been directed toward the methods of missing value imputation. In cases where the dataset includes outliers, the imputation of missing values might be incorrect or significantly deviate from the actual values. One of the challenges that impacts the quality of data is the handling of missing values and outliers simultaneously. Several studies removed outliers before imputing missing values or deleted observations with missing values before detecting outliers. The removal approach leads to a lack of information included within the data. Other researchers integrate clustering methods into the process of missing value imputation to mitigate the impact of outliers and data variations, thereby enhancing the accuracy of the imputation model. This paper proposes a new clustering-based sequential multivariate outlier detection (SMOD) method to effectively handle incomplete data within outliers. The method is applied to an official economic statistics dataset that involves outliers and performs a missing value rate scenario of about 50 percent. In comparison with a well-known and widely used clustering technique, i.e., model-based clustering (MBC), the proposed method works well in missing value imputation.

Original languageEnglish
Title of host publicationLecture Notes on Data Engineering and Communications Technologies
PublisherSpringer Science and Business Media Deutschland GmbH
Pages221-235
Number of pages15
DOIs
Publication statusPublished - 2024

Publication series

NameLecture Notes on Data Engineering and Communications Technologies
Volume191
ISSN (Print)2367-4512
ISSN (Electronic)2367-4520

Keywords

  • Clustering
  • Missing value
  • Multiple imputation
  • Outlier detection

Fingerprint

Dive into the research topics of 'The Impact of Clustering-Based Sequential Multivariate Outliers Detection in Handling Missing Values'. Together they form a unique fingerprint.

Cite this