wangbing1416 's Collections Reasoning Papers
updated
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving
Clipping Policy Optimization
Paper
• 2508.07629
• Published
• 43
Less Is More: Training-Free Sparse Attention with Global Locality for
Efficient Reasoning
Paper
• 2508.07101
• Published
• 14
Compressing Chain-of-Thought in LLMs via Step Entropy
Paper
• 2508.03346
• Published
• 8
Train Long, Think Short: Curriculum Learning for Efficient Reasoning
Paper
• 2508.08940
• Published
• 27
Sample More to Think Less: Group Filtered Policy Optimization for
Concise Reasoning
Paper
• 2508.09726
• Published
• 15
Pass@k Training for Adaptively Balancing Exploration and Exploitation of
Large Reasoning Models
Paper
• 2508.10751
• Published
• 29
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning
Models to Ask for Information
Paper
• 2508.11252
• Published
• 3
Deep Think with Confidence
Paper
• 2508.15260
• Published
• 90
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
Paper
• 2508.14029
• Published
• 118
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated
Chain-of-Thought-based Reinforced Fine-Tuning
Paper
• 2508.15868
• Published
• 3
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
• 2508.16949
• Published
• 24
TreePO: Bridging the Gap of Policy Optimization and Efficacy and
Inference Efficiency with Heuristic Tree-based Modeling
Paper
• 2508.17445
• Published
• 80
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large
Language Models
Paper
• 2508.18773
• Published
• 16
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
• 2508.19229
• Published
• 20
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
• 2509.01363
• Published
• 59
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
• 2509.02522
• Published
• 26
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper
• 2509.03059
• Published
• 25
Reverse-Engineered Reasoning for Open-Ended Generation
Paper
• 2509.06160
• Published
• 149
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published
• 105
Staying in the Sweet Spot: Responsive Reasoning Evolution via
Capability-Adaptive Hint Scaffolding
Paper
• 2509.06923
• Published
• 22
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Paper
• 2509.03646
• Published
• 33
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
The Majority is not always right: RL training for solution aggregation
Paper
• 2509.06870
• Published
• 15
The Choice of Divergence: A Neglected Key to Mitigating Diversity
Collapse in Reinforcement Learning with Verifiable Reward
Paper
• 2509.07430
• Published
• 3
Reasoning-Aware GRPO using Process Mining
Paper
• 2510.25065
• Published
• 42
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published
• 229
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable
Reasoning
Paper
• 2510.22543
• Published
• 14
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published
• 48
SemCoT: Accelerating Chain-of-Thought Reasoning through
Semantically-Aligned Implicit Tokens
Paper
• 2510.24940
• Published
• 18
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large
Reasoning Models
Paper
• 2510.24794
• Published
• 32
Data-Efficient RLVR via Off-Policy Influence Guidance
Paper
• 2510.26491
• Published
• 11
Black-Box On-Policy Distillation of Large Language Models
Paper
• 2511.10643
• Published
• 52
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Paper
• 2511.08577
• Published
• 108
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
• 2511.22570
• Published
• 91
REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance
Paper
• 2511.20233
• Published
• 3
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
Paper
• 2512.05033
• Published
• 17
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
Paper
• 2512.05325
• Published
• 4
Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision
Paper
• 2512.15489
• Published
• 10
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published
• 18
RelayLLM: Efficient Reasoning via Collaborative Decoding
Paper
• 2601.05167
• Published
• 31
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
Paper
• 2601.03559
• Published
• 14
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
Paper
• 2601.06002
• Published
• 56
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Paper
• 2512.20908
• Published
• 29
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
• 2601.09088
• Published
• 63
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
Paper
• 2601.14249
• Published
• 12
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper
• 2601.18778
• Published
• 40
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
• 2601.20614
• Published
• 119
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper
• 2601.20218
• Published
• 15
Memorization Dynamics in Knowledge Distillation for Language Models
Paper
• 2601.15394
• Published
• 3