Xiaoyu Tan
WIlliam1900
AI & ML interests
None yet
Recent Activity
authored
a paper
less than a minute ago
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
authored
a paper
less than a minute ago
The Choice of Divergence: A Neglected Key to Mitigating Diversity
Collapse in Reinforcement Learning with Verifiable Reward
authored
a paper
1 minute ago
AURORA:Automated Training Framework of Universal Process Reward Models
via Ensemble Prompting and Reverse Verification