Title: Group comparison with count-based compositional data
Authors: Jan Graffelman - Universitat Politecnica de Catalunya (Spain) [presenting]
Juan Jose Egozcue - Universitat Politecnica de Catalunya (Barcelona, Spain) (Spain)
Vera Pawlowsky-Glahn - University of Girona (Spain)
Abstract: Compositional data consists of data vectors containing relative information on the parts of some whole. Such data is subject to a constraint, something that is particularly clear in the case of data vectors that consist of percentages summing up to 100\%. The log-ratio transformation has been widely used to deal with compositional data, resulting in transformed data that is used as input for standard statistical procedures. Many compositional data sets are ultimately derived from counts that are expressed as fractions of a total. Asymptotically, log-ratio transformed compositions that are obtained from underlying multinomial counts follow the multivariate normal distribution, for which a theoretical covariance matrix can be obtained by using the delta method. This opens up the way to estimate the covariance matrix of the log-ratio coordinates in different ways, with either conventional sample-based estimators, or asymptotic theory inspired alternative estimators. Multivariate group comparisons by Hotelling's $T^2$ test or Wilks lambda rest, in the compositional setting, on the covariance matrix of the log-ratio coordinates. We evaluate with simulation studies which estimator for the covariance matrix of the log-ratio coordinates works best in different settings.