Title: Detecting outliers in compositional data using invariant coordinate selection
Authors: Christine Thomas-Agnan - CNRS (France) [presenting]
Anne Ruiz-Gazen - Toulouse School of Economics (France)
Thibault Laurent - Toulouse School of Economics (France)
Camille Mondon - École normale supérieure (France)
Abstract: Invariant Coordinate Selection (ICS) is a multivariate statistical method based on the simultaneous diagonalization of two scatter matrices. A model-based approach of ICS, called Invariant Coordinate Analysis, has already been adapted for compositional data. In a model-free context, ICS is also helpful in identifying outliers. We propose to develop a version of ICS for outlier detection in compositional data. This version is first introduced in coordinate space for a specific choice of ilr coordinate system associated with a contrast matrix and follows an existing outlier detection procedure. We then show that the procedure is independent of the choice of contrast matrix and can be defined directly in the simplex. To do so, we first establish some properties of the set of matrices satisfying the zero-sum property and introduce a simplex definition of the Mahalanobis distance and the one-step M-estimators class of scatter matrices. We also need to define the family of elliptical distributions in the simplex. We then show how to interpret the results directly in the simplex using two artificial datasets and a real dataset of market shares in the automobile industry.