Title: Variable selection in hidden Markov models with missing data
Authors: Fulvia Pennoni - University of Milano-Bicocca (Italy) [presenting]
Francesco Bartolucci - University of Perugia (Italy)
Silvia Pandolfi - University of Perugia (Italy)
Abstract: A novel variable and model selection method is proposed to analyze multiple time-series and panel data based on a Hidden Markov (HM) model for multivariate continuous responses. We consider an approach for inference, under the missing-at-random assumption to account for missing data, focusing on the maximum likelihood estimation of the model parameters through a modified Expectation-Maximization (EM) algorithm. We develop a greedy forward-backwards algorithm based on the Bayesian Information Criterion (BIC) seen as an approximation of the Bayes factor. In this way, we achieve a dimensionality reduction of the complete set of response variables to a smaller subset and thus we select the most useful variables for clustering purposes. The BIC is also used to choose the optimal number of latent states during the steps of the greedy search algorithm. In applying the selection method, the estimation of multivariate linear regression models is required. In the presence of missing values in the set of independent variables, we adopt a sort of multiple imputations based on the posterior expected values obtained at the convergence of the EM algorithm of the estimated HM model. To illustrate the proposal we use a collection of macroeconomic indicators provided by the World Bank related to 217 countries followed over a long period of time. The chosen HM model allows us to dynamically characterize countries' transitions between hidden states representing different levels of development.