Research on improving reinforcement learning, reasoning generalization, and optimization efficiency for training and fine-tuning large language models under resource constraints.
13 papers