Title: GEE-assisted variable selection for latent variable models: Making the most of zero consistency
Authors: Samuel Mueller - University of Sydney (Australia)
Alan Welsh - the Australian National University (Australia)
Francis Hui - The Australian National University (Australia) [presenting]
Abstract: In many disciplines, it is becoming common to collect and analyze multivariate or multi-response data. For example, the Southern Ocean Continuous Plankton Recorder (SO-CPR) survey is an annual survey which collect presence-absence observations on zooplankton assemblages in the Southern Ocean, with a primary goal being to identify important environmental factors driving the communities distribution while accounting for biotic affects such as species interactions. An increasingly popular approach for analyzing multivariate data in ecology is generalized linear latent variable models (GLLVMs), which utilizes latent variables to parsimoniously account for residual between species correlations. However, estimation let alone variable selection for GLLVMs presents a major computational challenge, since the marginal likelihood does not possess a closed form. To overcome this problem, we propose utilizing marginal generalized estimation equations (GEEs) to perform inference on GLLVMs. Focusing on multivariate binary data, we show that GEEs are zero consistent for GLLVMs. This then motivates us to propose two GEE-assisted selection methods: 1) information criteria based on score and Wald statistics; 2) penalized GEEs based on exploiting the grouped structure of the marginal coefficients. Both methods are asymptotically selection consistent for GLLVMs, with simulations studies demonstrating their computational efficiency and strong finite sample performance.