Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 14 days ago • 12 • 7
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 14 days ago • 12
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 14 days ago • 12 • 7
A Survey of On-Policy Distillation for Large Language Models Paper • 2604.00626 • Published 27 days ago • 12
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 77
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published Jan 14 • 63
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob Viewer • Updated Jan 15 • 435k • 1.23k • 58
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob Viewer • Updated Jan 15 • 435k • 1.23k • 58
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation Paper • 2512.20908 • Published Dec 24, 2025 • 29