DIAL-TFM

Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗

Trainable Dynamic Mask Sparse Attention (2508.02124)

JingzeShi

authored a paper 7 months ago

Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

Paper • 2505.19716 • Published May 26, 2025 • 4

JingzeShi

posted an update 9 months ago

Post

2693

@SmallDoge SmallTalks( SmallDoge/SmallTalks) is a synthetic dataset designed for supervised fine-tuning of language models. The dataset covers a variety of conversational content, including daily conversations, tool usage, Python programming, encyclopedia Q&A, exam problem-solving, logical reasoning, and more. Each task is provided in both English and Chinese versions.

JingzeShi

posted an update 10 months ago

Post

4838

We distill a more accurate and concise dataset from DeepSeek R1, and also provide a distillation pipeline code repository.🤗

Dataset: SmallDoge/SmallThoughts
Code: https://github.com/SmallDoges/small-thoughts

JingzeShi

posted an update 11 months ago

Post

3023

🤗Welcome to the Doge Edge Device Small language Model.

SmallDoge/Doge-160M-Instruct

JingzeShi

posted an update 11 months ago

Post

2317

Welcome to the Doge Face Open Source Community! 🚀
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable – small language models. 🔬
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! 🤗
👉 Repository name on Github: https://github.com/SmallDoges/small-doge
👉 Organization name on Hugging Face:

SmallDoge

JingzeShi

posted an update 12 months ago

Post

1735

🤩warmup -> stable -> decay leanring rate scheduler:
😎use the Stable Phase CheckPoints to Continue Training the model on Any New Dataset without spikes of the training!!!
SmallDoge/Doge-20M-checkpoint
SmallDoge/Doge-60M-checkpoint