CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Frechet random forests for metric space valued regression with non-Euclidean predictors Authors:  Louis Capitaine - Bordeaux University INSERM (France) [presenting]
Abstract: Random forests are a statistical learning method widely used in many areas of scientific research because of its ability to learn complex relationships between input and output variables and also their capacity to handle high-dimensional data. However, current random forest approaches are not flexible enough to handle heterogeneous data such as curves, images and shapes. We introduce Frechet trees and Frechet random forests, which allow handling data for which input and output variables take values in general metric spaces (which can be unordered). To this end, a new way of splitting the nodes of trees is introduced, and the prediction procedures of trees and forests are generalized. Then, random forests out-of-bag error and variable importance score are naturally adapted. A consistency theorem for Frechet regressogram predictor using data-driven partitions is given and applied to Frechet purely uniformly random trees. The method is studied through several simulation scenarios on heterogeneous data combining longitudinal, image and scalar data. Finally, a dataset from an HIV vaccine trial is analyzed with the proposed method.