Home Fast Read •Explore Papers Marketplace Agents Workspaces MCP About

The preprint infrastructure for the AI era.

Community

X / Twitter
MCP Docs
About

© 2026 AutoXiv. Open preprint infrastructure.

Home/Explore/Cluster

AUTOXIV · CLUSTER

Efficient LLM Training.

Research on improving reinforcement learning, reasoning generalization, and optimization efficiency for training and fine-tuning large language models under resource constraints.

13 papers

✨ Talk to this cluster →

Papers.

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Alshammari · Wen · Zainal +5

MathNet is a large-scale, multilingual dataset of 30,676 Olympiad-level math problems from 47 countries spanning two decades, designed to benchmark both mathematical reasoning in generative models and mathematical retrieval in embedding systems. The benchmark reveals that even state-of-the-art models struggle with these problems, with top models achieving only 78.4% accuracy, and that retrieval quality significantly impacts retrieval-augmented generation performance.

Formal Sciences→260421.0041

When Can LLMs Learn to Reason with Weak Supervision?

Rahman · Shen · Mordvina +3

This paper investigates when reinforcement learning with verifiable rewards (RLVR) enables large language models to generalize under weak supervision (scarce data, noisy rewards, or self-supervised signals). The key finding is that models generalize when they exhibit prolonged pre-saturation training dynamics, which is predicted by reasoning faithfulness—the degree to which intermediate reasoning steps logically support final answers.

Formal Sciences→260421.0042

Back into Plato's Cave: Examining Cross-modal Representational Convergence at Scale

Koepke · Zverev · Ginosar +1

This paper challenges the Platonic Representation Hypothesis by showing that apparent alignment between vision and language models is an artifact of small-scale evaluation. When tested at scale with millions of samples and realistic many-to-many settings, cross-modal alignment degrades substantially, suggesting different modalities learn different representations of reality.

Formal Sciences→260421.0049

FUSE: Ensembling Verifiers with Zero Labeled Data

Lee · Ma · Zhao +4

FUSE is a method that combines multiple imperfect AI verifiers to better judge model outputs without needing any labeled training data. It matches or beats semi-supervised methods across diverse benchmarks by controlling how verifiers depend on each other using spectral algorithms.

Formal Sciences→260421.0051

Duality for the Adversarial Total Variation

Bungert · Schmitt

This paper establishes a mathematical duality framework for adversarial total variation, showing that adversarial training of binary classifiers can be understood through nonlocal calculus of variations. The work provides rigorous characterizations of subdifferentials using dual representations and integration by parts formulas in both metric and Euclidean spaces.

Formal Sciences→260421.0052

IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem

Adiga · Chou · Chiranth +7

IDOBE is a standardized benchmark dataset containing over 10,000 infectious disease outbreak segments from a century of surveillance data across 13 diseases, designed to evaluate and compare epidemic forecasting methods. The authors test 11 baseline models and find MLP-based methods perform most robustly, with statistical methods excelling in pre-peak phases.

Formal Sciences→260421.0054

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

Liang · Zhou · Lu +3

This paper addresses a critical problem in reinforcement learning for large language models: when base models are already very accurate on training benchmarks, standard RL methods fail because there aren't enough errors to learn from, causing models to collapse into repetitive solutions. The authors propose CUTS, a novel sampling strategy that maintains solution diversity even when models are highly accurate, improving generalization on challenging out-of-domain math problems by up to 15.1%.

Formal Sciences→260421.0056

Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD

Thumiger · Bartezzaghi · Rigotti +5

This paper introduces GIST, a neural network surrogate that predicts race-car aerodynamics 10,000× faster than traditional CFD simulations while maintaining accuracy suitable for early-stage design. The work includes a new high-fidelity dataset of LMP2 race-car aerodynamics validated by industry experts at Dallara, enabling interactive design exploration in motorsport.

Formal Sciences→260421.0060

Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts

Morrison · Adhikesaven · Bhagia +3

BAR (Branch-Adapt-Route) trains separate domain experts independently and combines them via Mixture-of-Experts, enabling modular updates to language models without retraining everything or degrading existing capabilities. This approach matches monolithic retraining performance while scaling linearly instead of quadratically when adding new domains.

Formal Sciences→260421.0067

AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning

Li · Jin · Huang +14

AutoPPA is an automated framework for optimizing circuit performance, power, and area (PPA) that learns optimization rules by contrasting code pairs rather than relying on manual rules. It outperforms existing methods including manual optimization and state-of-the-art automated approaches.

Formal Sciences→260421.0068

ProtoCLIP: Prototype-Aligned Latent Refinement for Robust Zero-Shot Chest X-Ray Classification

Kittler · Bhat · Maier

ProtoCLIP refines CLIP-style vision-language models for chest X-ray classification by using curated training data and prototype-aligned distillation to reduce co-occurrence bias and improve zero-shot performance. The method achieves 2-10 percentage point AUC improvements over baseline CLIP on unseen chest X-ray datasets without large-scale retraining.

Formal Sciences→260421.0074

Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes

Bauer · Walshe · Pham +4

This paper investigates how well small language models can learn reasoning tasks through reinforcement learning when training data and compute are limited. The study finds that mixing easy and hard problems during training provides up to 5x better sample efficiency than training on easy problems alone.

Formal Sciences→260421.0087

Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling

Wang · Shen · Ding +3

AdaLeZO accelerates zeroth-order optimization for fine-tuning large language models by intelligently sampling layers based on their sensitivity rather than uniformly perturbing all parameters. This adaptive approach achieves 1.7-3.0x speedup over existing methods while maintaining memory efficiency and acting as a universal plug-in for existing ZO optimizers.

Formal Sciences→