DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels Paper • 2602.11715 • Published 6 days ago • 5
Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm Paper • 2602.11543 • Published 6 days ago • 4
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation Paper • 2602.11451 • Published 6 days ago • 15
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models Paper • 2602.06694 • Published 11 days ago • 15
SimpleGPT: Improving GPT via A Simple Normalization Strategy Paper • 2602.01212 • Published 16 days ago • 3
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection Paper • 2601.19375 • Published 22 days ago • 5
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors Paper • 2601.17958 • Published 23 days ago • 3
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability Paper • 2601.18778 • Published 22 days ago • 40
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published Jan 15 • 42
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published Jan 12 • 32
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published Jan 5 • 62
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 86