Title: A scalable partitioned approach to model massive non-stationary non-gaussian spatial data
Authors: Jaewoo Park - Yonsei University (Korea, South)
Seiyon Lee - George Mason University (United States) [presenting]
Abstract: Nonstationary non-Gaussian spatial data are common in many disciplines, including the environmental sciences, ecology, epidemiology, and social sciences. Examples include count data on disease incidence and binary satellite data on cloud mask (cloud/no-cloud). Modeling such data sets as global stationary spatial processes can be unrealistic since they are collected over large heterogeneous domains (i.e. range of spatial dependence varies across subregions). Although several approaches have been developed for nonstationary spatial models, these have focused primarily on Gaussian data. In fact, fitting nonstationary models to large non-Gaussian data sets can be computationally prohibitive. To address these challenges, we propose a scalable algorithm for modeling such data that leverages the parallel computing in modern high performance computing (HPC) systems. We partition the spatial domain into disjoint subregions and fit locally non-stationary models using a carefully curated set of spatial basis functions. Then, we combine the local processes using a novel adaptive neighbor-based weighting scheme. Our approach scales well to massive datasets (hundreds of thousands), provides accurate predictions, and can be implemented in the nimble software environment. We demonstrate our method to simulated examples and two large real-world data sets pertaining to infectious diseases and remotely sensed images of cloud cover.