view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 20 days ago • 100
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 6 items • Updated 7 days ago • 113
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 244
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 136
Mistral Large 3 Collection A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated Dec 2, 2025 • 81
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 265
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29, 2025 • 77
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published Oct 25, 2025 • 83
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17, 2025 • 89
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22, 2025 • 114
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published Oct 13, 2025 • 165
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 176