Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 5 days ago • 10
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 5 days ago • 77
Running Featured 63 Distilling 100B+ Models 40x Faster with TRL 📝 63 TRL distillation for 100B+ teachers, 40x faster
Running Featured 63 Distilling 100B+ Models 40x Faster with TRL 📝 63 TRL distillation for 100B+ teachers, 40x faster