Title: Group iterative hard-thresholding and generalized linear models in genetics
Authors: Kevin Keys - UCSF (United States)
Kenneth Lange - UCLA (United States)
Janet Sinsheimer - UCLA (United States)
Hua Zhou - UCLA (United States)
Benjamin Chu - University of California, Los Angeles (United States) [presenting]
Abstract: SNP-by-SNP (single nucleotide polymorphism) association testing is currently the de-facto statistical analysis employed for Genome Wide Association Studies (GWAS). This analysis approach ignores joint effects of SNPs. Iterative hard-thresholding (IHT) is one of the most scalable algorithms that performs multivariate model selection without shrinking effect sizes and circumvents the use of p-values. We implemented IHT in Julia to analyze GWAS data as a module under the open sourced statistical genetics ecosystem: OpenMendel. We modified the hard-thresholding operator to enforce sparsity on a group-level as well as within-group level. This accommodates for linkage disequilibrium because only the top predictors in each group are selected. Then we extend the framework of IHT to generalized linear models (GLM) so we can model non-continuous, non-normal response data. Our implementation enjoys built-in parallelism. We applied IHT on real and simulated datasets to demonstrate model quality, algorithm robustness, and scalability. For geneticists, our method offers multivariate model selection and maintains comparable speed/memory usage to some of the fastest algorithms available today. For theorists, we investigate properties of the group hard-thresholding operator, and derive best step sizes for IHT in the GLM setting.