B1389
Title: Training energy-based models with diffusion contrastive divergence
Authors: Weijian Luo - Peking University (China)
Hao Jiang - Harbin Institute of Technology (China)
Tianyang Hu - Huawei Noah's Ark Lab (China) [presenting]
Jiacheng Sun - Huawei Noah Ark Lab (China)
Zhenguo Li - Huawei Noah Ark Lab (China)
Zhihua Zhang - Peking University (China)
Abstract: Energy-Based Models (EBMs) have been widely used for generative modeling. Most training methods rely on sampling from the EBM by running Markov Chain Monte Carlo (MCMC) chains, e.g., Langevin dynamics. However, there seems to be an irreconcilable trade-off between the computational burden and the validity of the training objective. On one hand, running MCMC till convergence is computationally intensive. On the other hand, short-run MCMC, as in Contrastive Divergence (CD) and various extensions, is hindered by the extra non-negligible entropy term, which is hard to estimate in high dimensions. Thus, more attention should be paid to providing solutions to the above dilemma and proposing more efficient training paradigms for EBM. Inspired by the efficient forward process of diffusion probabilistic models, we consider diffusing both the data and the EBM distribution with general processes, proposing a novel unified divergence family that we call Diffusion Contrastive Divergence (DCD). DCD includes CD as a special case when taking EBM-induced Langevin dynamics as a diffusion process. Our DCD framework not only provides a deeper understanding of CD and other existing methods, but also facilitates new learning objectives as well. Specifically, by considering the VP and VE diffusion process, we uncover two new divergences that can be more efficient than CD in training EBMs. Extensive experiments on many benchmark tasks verify our method is more efficient than CD.