CMStatistics 2017: Start Registration
View Submission - CMStatistics
B0390
Title: Refactoring the FORTRAN code for LTS and MCD Algorithm in R Authors:  Peter Ruckdeschel - University of Oldenburg (Germany) [presenting]
Valentin Todorov - UNIDO (Austria)
Abstract: Recent progress in a complete refactoring of the (fast) LTS and MCD Algorithm completely into R is reported. For iid data, robust covariances and robust regression methods are readily available in many statistical software packages to amend the instability of the classical procedures in the presence of outliers. In R, this is true within the robustbase and rrcov packages. In both packages, code interfaces to original FORTRAN code. When it comes to weighted data as it arises, e.g., in stratified sampling, in model based clustering, Gaussian mixture models, or Hidden Markov Models, the Minimum covariance determinant (MCD) estimator as one of the most important robust alternatives so far is not available. The same is true for the LTS estimator for regression in a weighted data setting. We close this gap, (re-)implementing the (fast)MCD and (fast)LTS estimators from FORTRAN into pure R, with additional coverage of weighted data. Moreover the new pure R code lends itself more easily for future extensions like more general data structures for spares matrices than the current FORTRAN code. As with the reference MCD and LTS code in R/FORTRAN, we provide a reweighting step to achieve higher efficiency in the ideal model than the raw MCD / LTS estimator.