See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis Paper • 2602.20951 • Published 6 days ago • 13
SIMSPINE: A Biomechanics-Aware Simulation Framework for 3D Spine Motion Annotation and Benchmarking Paper • 2602.20792 • Published 6 days ago • 2
SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads Paper • 2602.07449 • Published 23 days ago • 3
Optimizing Few-Step Generation with Adaptive Matching Distillation Paper • 2602.07345 • Published 23 days ago • 9
Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection Paper • 2602.03216 • Published 27 days ago • 12
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published about 1 month ago • 107
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published Jan 29 • 19
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 100
One-step Latent-free Image Generation with Pixel Mean Flows Paper • 2601.22158 • Published Jan 29 • 18
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published Jan 15 • 39
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published May 13, 2025 • 42
Towards Pixel-Level VLM Perception via Simple Points Prediction Paper • 2601.19228 • Published Jan 27 • 18