multilingual-reward-bench

community

AI & ML interests

None defined yet.

Recent Activity

seungone authored a paper about 2 hours ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

seungone authored a paper about 2 hours ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

amphora submitted a paper 1 day ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

View all activity

models 0

None public yet

datasets 9

multilingual-reward-bench/m-arena-sampled

Viewer • Updated Mar 25, 2025 • 128 • 13

multilingual-reward-bench/m-arena

Viewer • Updated Mar 25, 2025 • 2.16k • 6

multilingual-reward-bench/MRB-Preview-1013

Viewer • Updated Oct 13, 2024 • 5.09k • 2

multilingual-reward-bench/code-en

Viewer • Updated Oct 12, 2024 • 80 • 17

multilingual-reward-bench/code-python

Viewer • Updated Oct 12, 2024 • 1.84k • 23

multilingual-reward-bench/safetyx1_prefx05_sky_x05_small

Viewer • Updated Oct 10, 2024 • 13.4k • 9

multilingual-reward-bench/safetyx2_prefx1_sky_x1_small

Viewer • Updated Oct 10, 2024 • 26.8k • 7

multilingual-reward-bench/safetyx2_prefx1_sky_x1

Viewer • Updated Oct 10, 2024 • 40.3k • 17

multilingual-reward-bench/open-assistant-sampled-new

Viewer • Updated Oct 7, 2024 • 444 • 97