Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Paper • 2508.05612 • Published Aug 7, 2025 • 2
Shuffle-R1 Collection Shuffle-R1 checkpoints and training/evaluation datasets. • 5 items • Updated about 3 hours ago • 1
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28, 2025 • 131