Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting Paper • 2510.08696 • Published Oct 9, 2025 • 15 • 3
Rethinking Thinking Tokens: LLMs as Improvement Operators Paper • 2510.01123 • Published Oct 1, 2025 • 6 • 2
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23, 2025 • 23 • 2
PILAF: Optimal Human Preference Sampling for Reward Modeling Paper • 2502.04270 • Published Feb 6, 2025 • 12 • 2