CMStatistics 2020: Start Registration
View Submission - CMStatistics
Title: Low-rank methods for multi-source, heterogeneous and incomplete data Authors:  Genevieve Robin - CNRS (France) [presenting]
Julie Josse - INRIA (France)
Eric Moulines - Ecole Polytechnique (France)
Robert Tibshirani - Stanford University (United States)
Olga Klopp - Essec Business School (France)
Abstract: In modern applications of statistics and machine learning, the urge to collect large data sets often leads to relaxing acquisition procedures and compounding diverse sources. As a result, analysts are confronted with many data imperfections. In particular, data are often heterogeneous, i.e. combine quantitative and qualitative information, incomplete, with missing values caused by machine failures or by the nonresponse phenomenon, and multi-source, when the data result from the aggregation of several data sets. We will present a general framework based on heterogeneous exponential family low-rank models, to analyse heterogeneous, multi-source and incomplete data sets. The theoretical results demonstrate that the method is simultaneously statistically sound with minimax optimal estimation properties and computationally efficient. We will illustrate the empirical behaviour of the method with the analysis of North-African waterbirds monitoring data set.