Title: On the hyperparameter settings of random forests
Authors: Philipp Probst - LMU Munich (Germany) [presenting]
Anne-Laure Boulesteix - LMU Munich (Germany)
Bernd Bischl - LMU Munich (Germany)
Abstract: Due to their good predictive performance, simple applicability and flexibility, random forests are getting increasingly popular for building prediction rules. Unfortunately, not much knowledge is available about the ideal hyperparameter settings of random forests. Some important hyperparameters are the number of trees, the number of randomly drawn features at each split, the number of randomly drawn samples in each tree and the minimal number of samples in a node. Common modern strategies for tuning are grid search, random search, iterated F-racing or Bayesian optimization. This can be too complicated for users without expertise on random forests, computationally costly or even infeasible in case of too big datasets. In an empirical study, we study the influence of a diverse range of hyperparameter settings of random forest algorithms and implementations of many different R packages on more than 200 different regression and classification problems from the OpenML platform. We use out-of-bag predictions and different performance measures for evaluation, and simple meta-learning to relate the performance results to data set characteristics. Our results yield valuable insights into a) parameter sensitivity for different performance measures b) optimal default settings, to be applied without further tuning c) tuning starting points and ranges for less time-consuming model building.