The Origin of Edge of Stability

Elon Litman

✨ TL;DR

This paper explains why full-batch gradient descent on neural networks drives the largest Hessian eigenvalue toward the threshold 2/η (the Edge of Stability phenomenon) through a novel functional called edge coupling. The work provides a unified theoretical explanation for this self-regulating behavior that has previously resisted complete understanding.

01 · Problem

The Edge of Stability is an empirically observed phenomenon where full-batch gradient descent on neural networks causes the largest Hessian eigenvalue to converge to 2/η, where η is the learning rate. While previous work has established that the system exhibits self-regulation near this edge, there has been no unified explanation for why the trajectory is forced toward this specific threshold from arbitrary initialization. This gap in understanding limits our ability to predict and control neural network training dynamics.

02 · Approach

The authors introduce edge coupling, a functional defined on consecutive iterate pairs whose coefficient is uniquely determined by the gradient descent update rule. By analyzing the criticality conditions of this functional, they derive a step recurrence relation with stability boundary at 2/η and a loss-change formula whose telescoping sum forces curvature toward 2/η. The key insight is using the mean value theorem to localize the different Hessian averages appearing in these formulas to the true Hessian at interior points, eliminating gaps in the forcing argument. The framework also classifies fixed points and period-two orbits by setting both gradients of the edge coupling to zero.

03 · Key insights

What the paper shows.

01Edge coupling is a functional whose criticality conditions directly yield the forcing mechanism toward 2/η without requiring additional assumptions

02The mean value theorem bridges different Hessian averages to the true Hessian at interior points, providing exact forcing with no theoretical gap

03The problem reduces to a function of half-amplitude alone near fixed points, determining which directions support period-two orbits

04The framework unifies self-regulation behavior and explains why the trajectory is forced toward 2/η from arbitrary initialization

04 · Results

The paper establishes that the step recurrence derived from edge coupling criticality has a stability boundary at exactly 2/η, and the loss-change formula's telescoping sum forces the largest Hessian eigenvalue toward this threshold. The analysis of fixed points and period-two orbits reveals which directions support oscillatory behavior and on which side of the critical learning rate they appear, providing a complete characterization of the Edge of Stability phenomenon.

05 · Limitations

The analysis focuses on full-batch gradient descent on neural networks; applicability to stochastic variants or other optimization algorithms is not addressed. The paper does not discuss computational aspects of verifying the theoretical predictions empirically or provide extensive numerical validation. The framework's extension to practical scenarios with regularization, batch normalization, or other common training techniques is not explored.

✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers