Fast training and sampling of Restricted Boltzmann Machines
Résumé
Restricted Boltzmann Machines (RBMs) are effective tools for modeling complex systems and deriving insights from data. However, training these models with highly structured data presents significant challenges due to the slow mixing characteristics of Markov Chain Monte Carlo (MCMC) processes. In this study, we build upon recent theoretical advancements in RBM training, focusing on the gradual encoding of data patterns into singular vectors of the coupling matrix, to significantly reduce the computational cost of training (in very clustered datasets) and evaluating and sampling in RBMs in general. The learning process is analogous to thermodynamic continuous phase transitions observed in ferromagnetic models, where new modes in the probability measure emerge in a continuous manner. Such continuous transitions are associated with the critical slowdown effect, which adversely affects the accuracy of gradient estimates, particularly during the initial stages of training with clustered data. To mitigate this issue, we propose a pre-training phase that encodes the principal components into a low-rank RBM through a convex optimization process. This approach facilitates efficient static Monte Carlo sampling and accurate computation of the partition function. Furthermore, we exploit the continuous and smooth nature of the parameter annealing trajectory to achieve reliable and computationally efficient log-likelihood estimations, enabling online assessment during the training process, and proposing a novel sampling strategy termed parallel trajectory tempering that outperforms previously optimized MCMC methods.
Our results demonstrate that this innovative training strategy enables RBMs to effectively address highly structured datasets that conventional methods struggle with. Additionally, we provide evidence that our log-likelihood estimation is more accurate than traditional, more computationally intensive approaches in controlled scenarios. Moreover, the parallel trajectory tempering algorithm significantly accelerates MCMC processes compared to existing and conventional methods.
Origine | Fichiers produits par l'(les) auteur(s) |
---|