Title: Random forests for high dimensional longitudinal data
Authors: Robin Genuer - Bordeaux University INSERM Vaccine Research Institute (France) [presenting]
Louis Capitaine - Bordeaux University INSERM Vaccine Research Institute (France)
Rodolphe Thiebaut - Bordeaux University INSERM Vaccine Research Institute (France)
Abstract: Random Forests are a statistical machine learning method which show good behaviors in high dimensional settings, such as genomic data analysis. However in many problems longitudinal data are available, i.e. measurements are done several times on the same individual-hence observations are not independent-, whereas random forests work on the assumption of i.i.d. samples. Based on semi-parametric mixed models and EM algorithm, we study existing random forests adaptations for high dimensional longitudinal data, as well as a new one. Simulation experiments are done and a real vaccinal trial for HIV dataset, DALIA-1, is analyzed. In this trial, 10 measurements of 32979 gene expressions are available for 18 infected patients. Results show that when the longitudinal aspect of data is taken into account, random forests managed to unravel complex mechanisms between a continuous outcome and a very large number of variables. Furthermore the proposed methodology exhibit faster convergence of EM algorithm and smaller prediction error than existing ones.