CMStatistics 2022: Start Registration
View Submission - CMStatistics
B1446
Title: Semi-automated estimation of weighted rates for e-commerce catalog quality monitoring Authors:  Mauricio Sadinle - University of Washington (United States) [presenting]
Abstract: E-commerce product catalogs are constantly evolving, and close monitoring of quality metrics is needed, which often requires identifying whether the product attributes contain defects. When such identification requires human auditing, catalog monitoring is extremely expensive to conduct frequently. We investigate approaches for tracking weighted rates over time, defined as the fraction of customer attention that goes to products with a certain defect. We assume that the gold standard for detecting such defects comes from human auditors, but to avoid collecting audited data at each point in time, we leverage automated procedures, such as classifiers. However, simply replacing human auditor decisions with automated predictions generally leads to large biases in the estimated weighted rates. We leverage automated procedures while obtaining approximately unbiased and low variance estimators of the rate of interest. We rely on being able to evaluate the quality of the automated procedure using audits at a baseline time or domain, and then extrapolate the performance of the procedure to the time or domain of interest. We perform extensive simulation studies to stress-test our proposed estimation approaches under a variety of scenarios representative of our actual use cases. Our proposed estimation approach is related to the task of quantification in machine learning, and so we draw connections throughout.