The Christmas holidays are here! 🎄 Thinking about learning something new in AI?
@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later 🙄)
Following up on LLaDA 2.0 , the paper is now out on Daily Papers🔥 It has sparked a lot of discussion in the community for showing how discrete diffusion LLMs can scale to 100B and run faster than traditional AR models. LLaDA2.0: Scaling Up Diffusion Language Models to 100B (2512.15745)
Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥
✨ Built from real enterprise data (Enron + financial institutions), not synthetic tasks ✨ Tests end-to-end finance workflows ✨ Multimodal & cross-file reasoning ✨ Expert annotated (700+ hours) and genuinely challenging hard
ICYMI, you can fine-tune open LLMs using Claude Code
just tell it: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”
and Claude submits a real training job on HF GPUs using TRL.
it handles everything: > dataset validation > GPU selection > training + Trackio monitoring > job submission + cost estimation when it’s done, your model is on the Hub, ready to use
It comes packed with updates: > Agent training with tools in GRPO > New CISPO & SAPO losses + reasoning rewards > vLLM quantization in colocate mode > Dataset shuffling in SFT > Lots of NEW examples > Tons of fixes and documentation improvements