SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published about 20 hours ago • 31
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models Paper • 2512.19526 • Published 2 days ago • 2
StoryMem: Multi-shot Long Video Storytelling with Memory Paper • 2512.19539 • Published 2 days ago • 13
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published 2 days ago • 58
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments Paper • 2512.19432 • Published 2 days ago • 10
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators Paper • 2512.19682 • Published 2 days ago • 14
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 6 days ago • 101
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories Paper • 2512.17419 • Published 5 days ago • 9
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 5 days ago • 47
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published 14 days ago • 73
Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets Paper • 2512.15110 • Published 7 days ago • 7