CMStatistics 2017: Start Registration
View Submission - CMStatistics
Title: A new method for variable selection in a two and multi-group case Authors:  Jan Walach - TU Wien (Austria) [presenting]
Peter Filzmoser - Vienna University of Technology (Austria)
Karel Hron - Palacky University (Czech Republic)
Beata Walczak - University of Silesia (Poland)
Lukas Najdekr - Palacky University (Czech Republic)
Abstract: One of the main goals in metabolomics is the identification of diagnostically important variables that allow to distinguish between different patient groups. Because of the so-called size-effect, which occurs due to different concentration, conventional variable selection methods cannot be directly applied on measured data. Rather, it is necessary to make the measurements for the different samples comparable, which is possible by using specific data transformations. An alternative is to investigate the so-called relative information, which consists of the relations (ratios) between the different variables. These are independent of the size-effect and can directly be used for data processing. Relative information is analyzed by the so-called log-ratio approach, which is a standard approach in the analysis of compositional data which follows a geometrical concept endowed with the Euclidean space structure. For the purpose of variable selection, the log-ratios between all pairs of variables are employed, and they are computed separately for the different patient groups. Potential marker variables are supposed to show different variability of the pairwise log-ratios in the different groups. The variability can be estimated robustly in order to downweight the influence of data artifacts. This method turns out to have clear advantages over traditional variable selection methods in this context, in particular for problems with different group sizes, and in presence of data outliers.