rl-rag/rar_cb_bs_16_rollout_8__1__1759453746_checkpoints_step_100
333k
•
Updated
•
5
rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921
Text Generation
•
8B
•
Updated
•
29
rl-rag/qwen3-8B-sft-mix-v20250921_long_form_only
Text Generation
•
8B
•
Updated
•
15
rl-rag/qwen3-8B-sft-mix-v20250921_05
Text Generation
•
8B
•
Updated
•
9
rl-rag/qwen3-8B-sft-mix-v20250921_short_form_only
Text Generation
•
8B
•
Updated
•
9
rl-rag/qwen3-8B-sft-mix-v20250921_005
Text Generation
•
8B
•
Updated
•
33
rl-rag/qwen3-8B-sft-mix-v20250921_02
Text Generation
•
8B
•
Updated
•
9
rl-rag/qwen3-8B-sft-mix-v20250921_01
Text Generation
•
8B
•
Updated
•
8
rl-rag/qwen3-8B-sft_0921_no_simple_short_form
Text Generation
•
8B
•
Updated
•
11
rl-rag/qwen3-8B-sft_0921_no_search_arena
Text Generation
•
8B
•
Updated
•
12
rl-rag/qwen3-8B-sft_0921_no_OS
Text Generation
•
8B
•
Updated
•
9
rl-rag/qwen3-8B-sft_0921_no_browse_comp
Text Generation
•
8B
•
Updated
•
7
rl-rag/qwen3-8B-v20250915_sampled_ablations
Text Generation
•
8B
•
Updated
•
6
rl-rag/qwen3-8B-sft_0915_webthinker_lfrs
Text Generation
•
8B
•
Updated
•
5
rl-rag/qwen3-8B-sft-mix-v20250921
Text Generation
•
8B
•
Updated
•
675
rl-rag/qwen3-8B-sft-mix-v20250921_rubric_lfrs
Text Generation
•
8B
•
Updated
•
9
rl-rag/qwen3-8B-sft_0915_likert_lfrs
Text Generation
•
8B
•
Updated
•
5
rl-rag/qwen3-8B-sft_0915_likert_lfrs_sampled
Text Generation
•
8B
•
Updated
•
5
rl-rag/qwen3-8B-lf_sft_0915_webthinker_lfrs_sampled
Text Generation
•
8B
•
Updated
•
5
rl-rag/qwen3-8B-sft-mix-v20250915
Text Generation
•
8B
•
Updated
•
13
rl-rag/qwen3-8b-base-combined-sft-training-data-v20250824_MiroSystemPrompt
Text Generation
•
8B
•
Updated
•
7
rl-rag/qwen3-8b-combined-sft-training-data-v20250824_MiroSystemPrompt
Text Generation
•
8B
•
Updated
•
4
rl-rag/qwen3-4b-it-combined-sft-training-data-v20250824_MiroSystemPrompt
Text Generation
•
4B
•
Updated
•
6
rl-rag/qwen2.5-7b-combined-sft-training-data-v20250824_MiroSystemPrompt
Text Generation
•
8B
•
Updated
•
3