Artículo

Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

Real data may contain both cellwise outliers and casewise outliers. There is a vast literature on robust estimation for casewise outliers, but only a scant literature for cellwise outliers and almost none for both types of outliers. Estimation of multivariate location and scatter matrix is a corner stone in multivariate data analysis. A two-step approach was recently proposed to perform robust estimation of multivariate location and scatter matrix in the presence of cellwise and casewise outliers. In the first step a univariate filter was applied to remove cellwise outliers. In the second step a generalized S-estimator was used to downweight casewise outliers. This proposal can be further improved in three main directions. First, through the introduction of a consistent bivariate filter to be used in combination with the univariate filter in the first step. Second, through the proposal of a new fast subsampling procedure to generate starting points for the generalized S-estimator in the second step. Third, through the use of a non-monotonic weight function for the generalized S-estimator to better handle casewise outliers in high dimension. A simulation study and a real data example show that, unlike the original two-step procedure, the modified two-step approach performs and scales well in high dimension. Moreover, they show that the modified procedure outperforms the original one and other state-of-the-art robust procedures under cellwise and casewise data contamination. © 2017 Elsevier B.V.

Registro:

Documento: Artículo
Título:Multivariate location and scatter matrix estimation under cellwise and casewise contamination
Autor:Leung, A.; Yohai, V.; Zamar, R.
Filiación:Department of Statistics, University of British Columbia, 3182-2207 Main Mall, Vancouver, British Columbia, V6T 1Z4, Canada
Departamento de Matemática, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 1, Buenos Aires, 1426, Argentina
Palabras clave:Cellwise outliers; Componentwise contamination; Multivariate location and scatter; Robust estimation; Location; Matrix algebra; Multivariant analysis; Cellwise outliers; Componentwise; Multivariate data analysis; Robust estimation; Robust procedures; Simulation studies; Two-step approach; Two-step procedure; Statistics
Año:2017
Volumen:111
Página de inicio:59
Página de fin:76
DOI: http://dx.doi.org/10.1016/j.csda.2017.02.007
Título revista:Computational Statistics and Data Analysis
Título revista abreviado:Comput. Stat. Data Anal.
ISSN:01679473
CODEN:CSDAD
Registro:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_01679473_v111_n_p59_Leung

Referencias:

  • Agostinelli, C., Leung, A., Yohai, V.J., Zamar, R.H., Rejoinder on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 484-488
  • Agostinelli, C., Leung, A., Yohai, V.J., Zamar, R.H., Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 441-461
  • Alqallaf, F.A., Konis, K.P., Martin, R.D., Zamar, R.H., Scalable robust covariance and correlation estimates for data mining (2002) Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 14-23. , In: KDD ’02. pp
  • Alqallaf, F., Van~Aelst, S., Yohai, V.J., Zamar, R.H., Propagation of outliers in multivariate data (2009) Ann. Statist., 37 (1), pp. 311-331
  • Danilov, M., Yohai, V.J., Zamar, R.H., Robust estimation of multivariate location and scatter in the presence of missing data (2012) J. Amer. Statist. Assoc., 107, pp. 1178-1186
  • Farcomeni, A., Robust constrained clustering in presence of entry-wise outliers (2014) Technometrics, 56, pp. 102-111
  • Friedman, J., Hastie, T., Tibshirani, R., Sparse inverse covariance estimation with the graphical lasso (2008) Biostatistics, 9 (3), pp. 432-441
  • Gnanadesikan, R., Kettenring, J.R., Robust estimates, residuals, and outlier detection with multiresponse data (1972) Biometrics, 28, pp. 81-124
  • Hall, P., Marron, J., Neeman, A., Geometric representation of high dimension, low sample size data (2005) J. R. Stat. Soc. Ser. B Stat. Methodol., 67, pp. 427-444
  • Leung, A., Danilov, M., Yohai, V., Zamar, R., GSE: Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data (2015), R package version 3.2.3; Maronna, R.A., Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 471-472
  • Maronna, R.A., Martin, R.D., Yohai, V.J., Robust Statistics: Theory and Methods (2006), John Wiley & Sons Chichister; Maronna, R.A., Yohai, V.J., Robust and efficient estimation of high dimensional scatter and location (2015); Martin, R., Robust covariances: Common risk versus specific risk outliers (2013), www.rinfinance.com/agenda/2013/talk/DougMartin.pdf, In: Presented at the 2013 R-Finance Conference, Chicago, IL, (visited 2016-08-24); Peña, D., Prieto, F.J., Multivariate outlier detection and robust covariance matrix estimation (2001) Technometrics, 43, pp. 286-310
  • Rocke, D.M., Robustness properties of S-estimators of multivariate location and shape in high dimension (1996) Ann. Statist., 24, pp. 1327-1345
  • Rousseeuw, P.J., Croux, C., Alternatives to the median absolute deviation (1993) J. Amer. Statist. Assoc., 88, pp. 1273-1283
  • Rousseeuw, P.J., Van~den Bossche, W., Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination (2015) TEST, 24 (3), pp. 473-477
  • Rousseeuw, P.J., Van den Bossche, W., 2016. Detecting deviating data cells. [stat.ME]; Van Aelst, S., Vandervieren, E., Willems, G., A Stahel-Donoho estimator based on Huberized outlyingness (2012) Comput. Statist. Data Anal., 56, pp. 531-542

Citas:

---------- APA ----------
Leung, A., Yohai, V. & Zamar, R. (2017) . Multivariate location and scatter matrix estimation under cellwise and casewise contamination. Computational Statistics and Data Analysis, 111, 59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007
---------- CHICAGO ----------
Leung, A., Yohai, V., Zamar, R. "Multivariate location and scatter matrix estimation under cellwise and casewise contamination" . Computational Statistics and Data Analysis 111 (2017) : 59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007
---------- MLA ----------
Leung, A., Yohai, V., Zamar, R. "Multivariate location and scatter matrix estimation under cellwise and casewise contamination" . Computational Statistics and Data Analysis, vol. 111, 2017, pp. 59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007
---------- VANCOUVER ----------
Leung, A., Yohai, V., Zamar, R. Multivariate location and scatter matrix estimation under cellwise and casewise contamination. Comput. Stat. Data Anal. 2017;111:59-76.
http://dx.doi.org/10.1016/j.csda.2017.02.007