CMStatistics 2022: Start Registration
View Submission - CMStatistics
Title: Paradoxes and resolutions for semiparametric data fusion of individual data and summary statistics Authors:  Wang Miao - Peking University (China) [presenting]
Abstract: External summary statistics have been used as constraints on the internal data distribution, which promised to improve the statistical inference in the internal data; however, paradoxical results arise in such data integration: efficiency loss may occur if the uncertainty of the summary statistics is not negligible and estimation bias can emerge if they are obtained from a different population from the internal study. We investigate these paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating a general functional of the internal data distribution, which is shown to be no larger than that using only internal data. We propose a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved. This initial data-fused estimator is further regularized with adaptive lasso penalty so that the resultant estimator can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and applications to a Helicobacter pylori infection dataset are used to illustrate the proposed methods.