Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta; Dhruv Kumar

✨ TL;DR

LPSR is an inference-time error correction method that monitors internal model activations to detect reasoning mistakes, then rolls back generation and steers the model using cached corrections—no training required. It improves an 8B model's math accuracy from 28.8% to 44.0%, outperforming prompted self-correction and even larger 70B models.

01 · Problem

Large language models frequently make unrecoverable reasoning errors during text generation. Once a model takes a wrong reasoning step, subsequent tokens tend to compound the mistake rather than self-correct, leading to cascading failures in multi-step tasks like mathematical reasoning. Existing inference-time correction methods like prompted self-correction often perform worse than standard generation, and scaling approaches like best-of-N sampling require prohibitively high token budgets. The fundamental challenge is detecting when an error occurs mid-generation and intervening effectively without requiring model retraining or expensive additional forward passes.

02 · Approach

Latent Phase-Shift Rollback (LPSR) operates at inference time by monitoring the residual stream at a critical layer during each generation step. The method uses a dual-gate mechanism combining cosine similarity and entropy metrics to detect abrupt directional reversals (phase shifts) in the activation space that signal reasoning errors. When a phase shift is detected, LPSR rolls back the key-value cache to before the error and injects a pre-computed steering vector to guide the model toward correct reasoning. The steering vectors are computed offline and cached, requiring no fine-tuning, gradient computation, or additional forward passes during inference. The method includes a layer sweep analysis to identify optimal monitoring depths.

03 · Key insights

What the paper shows.

01Detection-correction dissociation: The optimal layer for detecting errors (layer 14, AUC 0.718) differs from the optimal layer for maximizing task accuracy (layer 16, 44.0%), revealing that error detection and correction require monitoring at different representational depths

02Prompted self-correction actively harms performance (19.8% vs 28.8% baseline), falling 24.2 percentage points below LPSR, demonstrating that naive self-correction strategies can be counterproductive

03LPSR achieves efficiency gains over scaling: it matches a 70B model's performance (35.2%) using an 8B model with 8.75× fewer parameters, and outperforms Best-of-16 sampling at 5.4× lower token cost

04Phase shifts in the residual stream serve as reliable error indicators: the cosine-similarity plus entropy dual gate successfully identifies reasoning failures without requiring ground-truth labels at inference time

04 · Results

LPSR achieves 44.0% accuracy on MATH-500 with an 8B model, representing a +15.2 percentage point improvement over standard autoregressive generation (28.8%) with statistical significance (McNemar χ² = 66.96, p < 10⁻¹⁵). The method dramatically outperforms prompted self-correction by +24.2 pp (χ² = 89.4, p ≈ 0) and exceeds Best-of-16 sampling by +7.8 pp while using 5.4× fewer tokens. Notably, the 8B model with LPSR surpasses a standard 70B model (35.2%) using 8.75× fewer parameters at approximately 3× the token budget. The layer sweep reveals error-detection AUC peaks at layer 14 (0.718) while task accuracy peaks at layer 16 (44.0% vs 29.2% at layer 14).

05 · Limitations

The paper does not extensively discuss failure modes or cases where LPSR incorrectly identifies phase shifts, leading to unnecessary rollbacks or missed errors. The method requires pre-computed steering vectors, which may need task-specific tuning and could limit generalization to novel problem types. The detection-correction dissociation suggests that a single monitoring layer may not be optimal for all scenarios, but the paper does not explore adaptive or multi-layer monitoring strategies. Computational overhead from continuous residual stream monitoring is not thoroughly analyzed. The evaluation focuses primarily on mathematical reasoning (MATH-500), leaving open questions about performance on other reasoning domains or generation tasks where error patterns may differ.

✨ Generated by Claude · Apr 21, 2026 · Read the PDF for authoritative content.

What the paper shows.

↘ Related papers