Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Paper • 2605.06241 • Published • 5
None defined yet.
Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?