Exploring the Inefficiency of Heavy Ball as Momentum Parameter Approaches 1

Exploring the Inefficiency of Heavy Ball as Momentum Parameter Approaches 1

Xiaoge Deng, Tao Sun, Dongsheng Li, Xicheng Lu

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 3899-3907. https://doi.org/10.24963/ijcai.2024/431

The heavy ball momentum method is a commonly used technique for accelerating training processes in the machine learning community. However, empirical evidence suggests that the convergence of stochastic gradient descent (SGD) with heavy ball may slow down when the momentum hyperparameter approaches 1. Despite this observation, there are no established theories or solutions to explain and address this issue. In this study, we provide the first theoretical result that elucidates why momentum slows down SGD as it tends to 1. To better understand this inefficiency, we focus on the quadratic convex objective in the analysis. Our findings show that momentum accelerates SGD when the scaling parameter is not very close to 1. Conversely, when the scaling parameter approaches 1, momentum impairs SGD and degrades its stability. Based on the theoretical findings, we propose a descending warmup technique for the heavy ball momentum, which exploits the advantages of the heavy ball method and overcomes the inefficiency problem when the momentum tends to 1. Numerical results demonstrate the effectiveness of the proposed SHB-DW algorithm.
Keywords:
Machine Learning: ML: Optimization
Machine Learning: ML: Applications
Machine Learning: ML: Learning theory