CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Weighting of parts in compositional data and its applications Authors:  Karel Hron - Palacky University (Czech Republic) [presenting]
Alessandra Menafoglio - Politecnico di Milano (Italy)
Javier Palarea-Albaladejo - Biomathematics and Statistics Scotland (United Kingdom)
Peter Filzmoser - Vienna University of Technology (Austria)
Renata Talska - Palacky University Olomouc (Czech Republic)
Juan Jose Egozcue - Universitat Politecnica de Catalunya (Barcelona, Spain) (Spain)
Abstract: It often occurs in practice that it is sensible to give different weights to the variables involved in multivariate data analysis. The same holds for compositional data as multivariate observations carrying relative information, such as proportions or percentages. It can be convenient to apply weights to, for example, better accommodate differences in the quality of the measurements, the occurrence of zeros and missing values, or generally to highlight some specific features of compositional variables (i.e. parts of a whole). The characterisation of compositional data as elements of a Bayes space enables the definition of a formal framework to implement weighting schemes for the parts of a composition. This is formally achieved by considering a reference measure in the Bayes space alternative to the common uniform measure via the well-known chain rule. Unweighted centred log-ratio (clr) coefficients and isometric log-ratio (ilr) coordinates then allow representing compositions in the real space equipped with the (unweighted) Euclidean geometry, where ordinary multivariate statistical methods can be used and interpreted. We present these formal developments and use them to introduce a general approach to weighting parts in compositional data analysis. We demonstrate its practical usefulness on simulated and real-world data sets in the context of the earth sciences.