Title: Sequential permutation testing of random forest variable importance measures
Authors: Alexander Hapfelmeier - Technical University of Munich (Germany) [presenting]
Roman Hornung - University of Munich (Germany)
Bernhard Haller - Institute of AI and Informatics in Medicine - Technical University of Munich (Germany)
Abstract: Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs of conventional permutation tests. The popular and widely used permutation accuracy VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply; that is, the type-I error probability is controlled at a nominal level, and high power is maintained with considerably fewer permutations needed. The numerical stability of the methods is investigated in two additional application studies. Recommendations for application are given. A respective implementation is provided through the accompanying R package rfvimptest. The approach can easily be applied to any kind of prediction model.