CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Sparse pairwise logratio variable selection for high-dimensional compositional data Authors:  Paulina Jaskova - Palacky University Olomouc (Czech Republic) [presenting]
Matthias Templ - Vienna University of Technology (Austria)
Karel Hron - Palacky University Olomouc (Czech Republic)
Javier Palarea-Albaladejo - University of Girona (Spain)
Abstract: In omics sciences, biomarker identification is of paramount importance. However, from a statistical perspective, this is a challenging task due to the high dimensionality of the data and the associated computational burden. Metabolomics data have been characterized as compositional data, relative data in which the relevant information is contained in (log)ratios between the variables/components that make up the observed metabolomic profile. Accordingly, it is possible to express biomarkers in terms of log-contrasts or any logratio coordinate representation. Alternatively, we can consider them directly in terms of their basic information provided by pairwise logratios. The main goal is to present and discuss a procedure for variable selection based on pairwise logratios from high-dimensional compositional data in the framework of the orthonormal (orthogonal) logratio coordinate approach. After an initial dimensionality reduction aimed at filtering out noisy variables through univariate data processing, a selection algorithm is applied to obtain non-overlapping pairwise logratios, which are then used to effectively construct an orthonormal logratio coordinate system. This covers all possible pairwise logratios of a (sub)composition formed from such a set of initial pairwise logratios. Partial least squares regression is then applied to identify significant logratios. The properties of this new approach will be investigated using real, high-dimensional compositions.