Title: Identifiability in regression methods for respondent-driven sampling
Authors: Erica Moodie - McGill University (Canada)
Mamadou Yauck - UQAM (Canada) [presenting]
Michael Hudgens - The University of North Carolina at Chapel Hill (United States)
Abstract: Respondent-driven sampling (RDS) is a form of link-tracing sampling, a technique for sampling hard-to-reach populations that aims to leverage individuals' social relationships to reach potential participants. An RDS sample represents a partially observed network of unknown dependence structures. Further, it is common to observe the social `connectedness' of individuals with similar traits or homophily. Current analytical approaches for RDS data focus mainly on estimating means and proportions but give little technical consideration to multivariate modeling. Progress in this area is limited by a missing data problem: the observed RDS network reveals partial information about social connections between individuals in the sample. We show that the parameters of regression models are not, in general, identifiable because different full data distributions may give rise to the same observed data distribution. The lack of identification causes a violation of some model assumptions; in the modeling of the homophily effects, the conditional expectation of the error term in the linear regression, given the vector of covariates, is not zero. Thus, standard inferential methods such as maximum likelihood estimation will not in general be valid. We introduce additional assumptions to characterize the asymptotic biases of the maximum likelihood estimators of the homophily effects and the network-induced correlation parameters, and propose bias-corrected estimators.