daixuancheng/Qwen3-VL-8B-Thinking_stage3_MixAllRL_and_dataMixRatio_and_easy2hard 9B • Updated 12 days ago • 20
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step20 9B • Updated Nov 29 • 4
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step36 9B • Updated Nov 29 • 115
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step4 9B • Updated Nov 29 • 4
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step10 9B • Updated Nov 29 • 4
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-80_actor Text Generation • 8B • Updated Jun 26 • 5
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-20_actor Text Generation • 8B • Updated Jun 26 • 5
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-80_critic Text Generation • 8B • Updated Jun 25 • 6
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-20_critic Text Generation • 8B • Updated Jun 25 • 6
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-60_critic Text Generation • 8B • Updated Jun 25 • 8
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-120_critic Text Generation • 8B • Updated Jun 25 • 8
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step120_crtic Text Generation • 8B • Updated Jun 25 • 7
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-120_actor Text Generation • 8B • Updated Jun 25 • 8
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-60_actor Text Generation • 8B • Updated Jun 25 • 7
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step120_actor Text Generation • 8B • Updated Jun 25 • 6
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step80 Text Generation • 8B • Updated Jun 25 • 7
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step60 Text Generation • 8B • Updated Jun 25 • 6