Title: The good, the bad, and the ugly: Overconfident Bayesian model selection in molecular phylogenetics
Authors: Tianqi Zhu - Academy of Mathematics and Systems Science Chinese Academy of Sciences (China)
Ziheng Yang - University College London (United Kingdom) [presenting]
Abstract: Bayesian model selection is widely used to compare species phylogenetic trees in molecular phylogenetics. It is noted to produce high and spurious posterior probabilities for phylogenies in large datasets, but the precise reasons for this overconfidence are unknown. The empirical observations have prompted us to study the asymptotic behavior of Bayesian model selection when the models under comparison have the same number of parameters and are equally wrong. We found that in such cases, Bayesian model selection exhibits surprising and polarized behaviors in large datasets, supporting one model with full force in some datasets while supporting another in others. If one model is slightly less wrong than the other, the less wrong model will eventually win when the amount of data increases, but the method tends to become overconfident before it becomes reliable. This extreme behavior appears to be part of the reason for the spuriously high posterior probabilities for evolutionary trees. We discuss a few strategies suggested in the literature, but the question of ``what should one do'' remains open.