Jiunsong/supergemma4-26b-abliterated-multimodal-mlx-4bit Image-Text-to-Text • 5B • Updated 28 days ago • 7.93k • 52
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference Paper • 2605.07363 • Published 8 days ago • 12
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 9 days ago • 14
AcademiClaw: When Students Set Challenges for AI Agents Paper • 2605.02661 • Published 12 days ago • 16
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models Paper • 2605.05204 • Published 10 days ago • 25
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 9 days ago • 42
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 8 days ago • 64
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 9 days ago • 106
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 8 days ago • 93
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 12 days ago • 114
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 16 days ago • 213
ibm-granite/granite-speech-4.1-2b Automatic Speech Recognition • 2B • Updated 16 days ago • 194k • 93
Running 157 The ultimate guide to RL environments: building and scaling them in the LLM era 📝 157 Building and scaling RL environments for LLM training