admarcosai 's Collections Datasets
updated
Beyond Human Data: Scaling Self-Training for Problem-Solving with
Language Models
Paper
• 2312.06585
• Published • 29
TinyGSM: achieving >80% on GSM8k with small language models
Paper
• 2312.09241
• Published • 40
Viewer
• Updated • 70k • 1.45k
• 92
Paper
• 2309.17425
• Published • 7
jondurbin/gutenberg-dpo-v0.1
Viewer
• Updated • 918 • 1.27k
• 164
garage-bAInd/Open-Platypus
Viewer
• Updated • 24.9k • 10.6k
• 418
Viewer
• Updated • 243k • 732
• 220
Viewer
• Updated • 58.7k • 989
• 46
Viewer
• Updated • 1.49M • 1.47k
• 154
Viewer
• Updated • 166k • 1.42k
• 119
Viewer
• Updated • 198k • 147
• 112
Viewer
• Updated • 2.75M • 18.9k
• 393
Viewer
• Updated • 6.2M • 2.85k
• 105
open-web-math/open-web-math
Viewer
• Updated • 6.32M • 44.6k
• 342
Viewer
• Updated • 4.04k • 62.2k
• 224
Viewer
• Updated • 14.3k • 1.62k
• 51
Viewer
• Updated • 44.8k • 294
• 54
Viewer
• Updated • 6.14k • 21k
• 217
Viewer
• Updated • 262k • 8.13k
• 303
argilla/ultrafeedback-binarized-preferences-cleaned
Viewer
• Updated • 60.9k • 14.7k
• 162
WhiteRabbitNeo/Code-Functions-Level-Cyber
Viewer
• Updated • 8.44k • 33
• 32
WhiteRabbitNeo/Code-Functions-Level-General
Viewer
• Updated • 8.69k • 12
• 20
Viewer
• Updated • 317k • 22.1k
• 33
Updated • 4.69k
• 138
Viewer
• Updated • 183k • 1.12k
• 295
selfrag/selfrag_train_data
Viewer
• Updated • 146k • 242
• 77
Viewer
• Updated • 463k • 126
• 18
Locutusque/UltraTextbooks
Viewer
• Updated • 5.52M • 2.84k
• 199
Undi95/ConversationChronicles-sharegpt-SHARDED
Viewer
• Updated • 787k • 302
• 10
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Paper
• 2402.10176
• Published • 38
Viewer
• Updated • 31.1M • 17k
• 704
togethercomputer/RedPajama-Data-1T
Viewer
• Updated • 1.73M • 2.57k
• 1.16k
Viewer
• Updated • 968M • 20.9k
• 914
Viewer
• Updated • 276M • 13.4k
• 168
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper
• 2412.14475
• Published • 59