Convergent Evolution: How Different Language Models Learn Similar Number Representations

Deqing Fu; Tianyi Zhou; Mikhail Belkin; Vatsal Sharan; Robin Jia

✨ TL;DR

This paper reveals that different language model architectures (Transformers, RNNs, LSTMs) converge on learning similar periodic number representations with periods at 2, 5, and 10, despite being trained differently. The authors identify a two-tiered hierarchy of these features and explain when models learn geometrically separable representations useful for modular arithmetic.

01 · Problem

Language models learn to represent numbers using periodic features, but it's unclear why different architectures and training approaches converge on similar representations, or what distinguishes features that merely show Fourier domain periodicity from those that enable geometric separability for modular arithmetic tasks. Understanding this convergence is important for comprehending how neural networks learn structured mathematical concepts from unstructured text.

02 · Approach

The authors conduct a systematic empirical and theoretical analysis across multiple model types (Transformers, Linear RNNs, LSTMs, classical embeddings) trained with different methods. They use Fourier analysis to characterize periodic features and test for geometric separability through linear classification of numbers modulo T. Theoretically, they prove that Fourier domain sparsity is necessary but insufficient for mod-T geometric separability. They investigate which factors (data, architecture, optimizer, tokenizer) determine when models learn separable features, identifying two acquisition routes: from co-occurrence signals in natural language and from multi-token arithmetic problems.

03 · Key insights

What the paper shows.

01Different model architectures and training procedures exhibit convergent evolution, learning similar periodic number representations with dominant periods at T=2, 5, 10

02Fourier domain sparsity is necessary but not sufficient for geometric separability in modular arithmetic, establishing a theoretical hierarchy of feature quality

03Models can acquire geometrically separable features through two distinct pathways: from natural language co-occurrence patterns (text-number and cross-number interactions) or from multi-token addition problems

04Data, architecture, optimizer, and tokenizer design all significantly influence whether models learn separable versus merely periodic features

04 · Results

The paper demonstrates that while all tested model architectures learn period-T features visible in Fourier analysis, only some achieve geometric separability for mod-T classification. The authors show that multi-token addition problems reliably induce separable features, whereas single-token problems do not. They identify specific co-occurrence patterns in natural language that correlate with learning separable representations, and demonstrate that architectural and optimization choices substantially affect feature quality despite convergence on similar periodic patterns.

05 · Limitations

The study focuses primarily on periods T=2, 5, 10 and may not generalize to other moduli. The analysis is limited to relatively standard architectures and training regimes; the findings may not extend to very large models or novel training paradigms. The paper relies on post-hoc analysis of learned features rather than direct intervention during training, which limits causal claims about feature acquisition mechanisms. The connection between geometric separability and downstream task performance is not thoroughly explored.

✨ Generated by Claude · Apr 25, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers