Comparison of Robust Estimators’ Performance for Detecting Outliers in Multivariate Data

Authors

  • Sharifah Sakinah Syed Abd Mutalib Centre for Mathematical Sciences, College of Computing & Applied Sciences, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Pahang
  • Siti Zanariah Satari Centre for Mathematical Sciences, College of Computing & Applied Sciences, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Pahang
  • Wan Nur Syahidah Wan Yusoff Centre for Mathematical Sciences, College of Computing & Applied Sciences, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Pahang

Keywords:

Mahalanobis distance, Multivariate data, Outliers, Robust Estimators, Test on Covariance

Abstract

In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamping effects if the data contain outliers. Due to this problem, many studies used a robust estimator instead of the classical estimator of mean and covariance matrix. In this study, the performance of five robust estimators namely Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME), Index Set Equality (ISE), and Test on Covariance (TOC) are investigated and compared. FMCD has been widely used and is known as among the best robust estimator. However, there are certain conditions that FMCD still lacks. MVV, CME, ISE and TOC are innovative of FMCD. These four robust estimators improve the last step of the FMCD algorithm. Hence, the objective of this study is to observe the performance of these five estimator to detect outliers in multivariate data particularly TOC as TOC is the latest robust estimator. Simulation studies are conducted for two outlier scenarios with various conditions. There are three performance measures, which are pout, pmask and pswamp used to measure the performance of the robust estimators. It is found that the TOC gives better performance in pswamp for most conditions. TOC gives better results for pout and pmask for certain conditions.

Downloads

Download data is not yet available.

Downloads

Published

2021-10-15

How to Cite

Syed Abd Mutalib, S. S. ., Satari, S. Z., & Wan Yusoff, W. N. S. (2021). Comparison of Robust Estimators’ Performance for Detecting Outliers in Multivariate Data. Journal of Statistical Modeling &Amp; Analytics (JOSMA), 3(2). Retrieved from https://vmis.um.edu.my/index.php/JOSMA/article/view/32399

Most read articles by the same author(s)