View Submission - HiTECCoDES2025
A0150
Title: Consistent estimation of linear regression models from different data sources with many variables in common Authors:  Masayuki Hirukawa - Ryukoku University (Japan) [presenting]
Abstract: When conducting regression analysis, econometricians often face situations where some regressors are unavailable in the primary dataset (e.g., an ability measure in wage regression). Suppose that they can find an auxiliary dataset that contains missing regressors as well as other variables common across two datasets (overlapping variables). Under this environment, it is possible to estimate regression coefficients consistently by combining primary and auxiliary datasets. Examples of such estimation procedures are the matched-sample indirect inference (MSII) and the plug-in least squares (PILS). However, these estimators can attain the parametric convergence rate only if the number of overlapping variables is three or less. Then, the scope of MSII and PILS is extended so that both can restore the parametric convergence rate when primary and auxiliary datasets have many overlapping variables. The extension takes three steps, namely, (i) dimension reduction under some structural assumption on the conditional expectations of missing regressors given overlapping variables, (ii) imputation of proxies for missing regressors, and (iii) estimation of the regression model. Convergence properties of extended MSII and PILS are explored in conjunction with covariance estimation. Monte Carlo simulations confirm their nice finite-sample properties, and a real data example of intergenerational income mobility is also presented.