Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published 2 days ago • 42
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21, 2025 • 83
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published Oct 9, 2025 • 26
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? Paper • 2510.08189 • Published Oct 9, 2025 • 26
Better Process Supervision with Bi-directional Rewarding Signals Paper • 2503.04618 • Published Mar 6, 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10, 2025 • 56
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 20
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 20
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published Sep 30, 2025 • 20 • 2
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision Paper • 2411.16579 • Published Nov 25, 2024 • 3
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling Paper • 2411.00750 • Published Nov 1, 2024 • 1
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper • 2402.05808 • Published Feb 8, 2024