CMStatistics 2018: Start Registration
View Submission - CMStatistics
Title: Random forest prediction intervals Authors:  Haozhe Zhang - Iowa State University (United States)
Joshua Zimmerman - Iowa State University (United States)
Dan Nettleton - Iowa State University (United States) [presenting]
Daniel Nordman - Iowa State University (United States)
Abstract: Random forests are among the most popular machine learning techniques for prediction problems. When using random forests to predict a quantitative response, an important but often overlooked challenge is the determination of prediction intervals that will contain an unobserved response value with a specified probability. We propose new random forest prediction intervals that are based on the empirical distribution of out-of-bag prediction errors. These intervals can be obtained as a by-product of a single random forest. Under regularity conditions, we prove that the proposed intervals have asymptotically correct coverage rates. Simulation studies and analysis of 60 real datasets are used to compare the finite-sample properties of the proposed intervals with quantile regression forests and recently proposed split conformal intervals. The results indicate that intervals constructed with our proposed method tend to be narrower than those of competing methods while still maintaining marginal coverage rates approximately equal to nominal levels.