When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels Paper • 2605.06652 • Published 9 days ago • 5
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 Sentence Similarity • 0.1B • Updated Jan 28 • 48M • • 1.23k
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 324
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 628
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems Paper • 2604.03295 • Published Mar 27 • 10
daaxila/twitter-qiqsp11-2026.03.14-2032736898289193270-QdIUlmm_a78ZlsNI-part1 Viewer • Updated Apr 1 • 1 • 34 • 1
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 311
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published Mar 16 • 153
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 210