Sean13/llama-8b-instruct-v0.2-cpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21 • 8
Sean13/mistral-7b-instruct-v0.2-cpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21 • 6
Sean13/mistral-7b-instruct-v0.2-simpo-full-label_smoothing-0.1 Text Generation • 266k • Updated Nov 21 • 4
Sean13/llama-8b-instruct-rdpo-full-multipref-init-eta-0.99 Text Generation • 266k • Updated Nov 20 • 3
Sean13/llama-8b-instruct-rdpo-full-multipref-init-eta-0.80 Text Generation • 266k • Updated Nov 20 • 4