EcoSta 2022: Start Registration
View Submission - EcoSta2022
A0803
Title: Sparse classification with multi-type data Authors:  Reka Howard - University of Nebraska - Lincoln (United States) [presenting]
Vamsi Manthena - University of Nebraska - Lincoln (United States)
Diego Jarquin - University of Florida (United States)
Abstract: Genomic selection is a technique in plant breeding that implements a model for predicting phenotypes using marker information without the need of testing the individuals in fields thus saving resources. Since genomic selection was first introduced many statistical methods have been proposed, and not only marker information was used for prediction but also other data types (high-throughput phenotyping, pedigree, weather, etc.). Integrating different data types becomes a complex challenge when they have very different dimensions. A key challenge is to build models that are able to access the unique information present in each data type in order to improve the prediction capabilities. Breeders are often interested in categorical phenotypic traits such as resistance to drought or salinity, susceptibility to disease, and days to maturity or flowering. While there is extensive literature covering the prediction of continuous traits, there is limited literature developing genomic prediction models for classification. We present a classification method where we integrate three data types - secondary traits, weather, and genomic information for classification. We compared our method to two standard classifiers such as random forests and SVMs. The proposed three-stage method allows us to access the information present in each data type to improve prediction.