✨ TL;DR
This paper reveals that different language model architectures (Transformers, RNNs, LSTMs) converge on learning similar periodic number representations with periods at 2, 5, and 10, despite being trained differently. The authors identify a two-tiered hierarchy of these features and explain when models learn geometrically separable representations useful for modular arithmetic.
Language models learn to represent numbers using periodic features, but it's unclear why different architectures and training approaches converge on similar representations, or what distinguishes features that merely show Fourier domain periodicity from those that enable geometric separability for modular arithmetic tasks. Understanding this convergence is important for comprehending how neural networks learn structured mathematical concepts from unstructured text.
The authors conduct a systematic empirical and theoretical analysis across multiple model types (Transformers, Linear RNNs, LSTMs, classical embeddings) trained with different methods. They use Fourier analysis to characterize periodic features and test for geometric separability through linear classification of numbers modulo T. Theoretically, they prove that Fourier domain sparsity is necessary but insufficient for mod-T geometric separability. They investigate which factors (data, architecture, optimizer, tokenizer) determine when models learn separable features, identifying two acquisition routes: from co-occurrence signals in natural language and from multi-token arithmetic problems.
What the paper shows.
The paper demonstrates that while all tested model architectures learn period-T features visible in Fourier analysis, only some achieve geometric separability for mod-T classification. The authors show that multi-token addition problems reliably induce separable features, whereas single-token problems do not. They identify specific co-occurrence patterns in natural language that correlate with learning separable representations, and demonstrate that architectural and optimization choices substantially affect feature quality despite convergence on similar periodic patterns.
The study focuses primarily on periods T=2, 5, 10 and may not generalize to other moduli. The analysis is limited to relatively standard architectures and training regimes; the findings may not extend to very large models or novel training paradigms. The paper relies on post-hoc analysis of learned features rather than direct intervention during training, which limits causal claims about feature acquisition mechanisms. The connection between geometric separability and downstream task performance is not thoroughly explored.
✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.