CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1535
Title: On the robustness of machine learning methods for genomic prediction Authors:  Vanda Lourenco - NOVA University of Lisbon and NOVA.id.FCT (Portugal) [presenting]
Piepho Hans-Peter - University of Hohenheim (Germany)
Joseph O. Ogutu - Bioinformatics Unit - Institute of Crop Science - University of Hohenheim (Germany)
Abstract: The accurate prediction of genomic breeding values is central to genomic selection in both plant and animal breeding studies. Genomic prediction (GP) involves the use of thousands of molecular markers spanning the entire genome and therefore requires methods able to efficiently handle high dimensional data. Machine learning (ML) methods, which encompass different groups of supervised and unsupervised learning methods, are becoming widely advocated for and used in GP studies. Although several studies have compared the predictive performances of individual methods, studies comparing the predictive performance of different groups of methods are rare. This is also the case of studies that assess the predictive performance of methods when data are contaminated. However, such studies are crucial for (i) identifying groups of methods with superior predictive performance, and (ii) assessing the merits and demerits of such groups of methods relative to each other and to the established classical methods when the phenotypic data are and are not contaminated. We comparatively evaluate, in terms of predictive accuracy and prediction errors, the genomic predictive performance and robustness of several groups of supervised ML methods. Specifically, regularized, ensemble, and instance-based methods, using one simulated dataset (animal breeding population; three distinct traits).