Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators Paper • 2602.22647 • Published 5 days ago • 2
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning Paper • 2602.23258 • Published 4 days ago • 27
Fast KVzip: Efficient and Accurate LLM Inference with Gated KV Eviction Paper • 2601.17668 • Published Jan 25 • 7
The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems Paper • 2601.15059 • Published Jan 21 • 4
A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification Paper • 2601.13288 • Published Jan 19 • 14
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published Jan 20 • 47
Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models Paper • 2601.14152 • Published Jan 20 • 6
Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models Paper • 2601.15220 • Published Jan 21 • 9
Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM Paper • 2601.09001 • Published Jan 13 • 18
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance Paper • 2601.14171 • Published Jan 20 • 51
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind Paper • 2601.15715 • Published Jan 22 • 14
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers Paper • 2601.17367 • Published Jan 24 • 34
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Paper • 2601.14243 • Published Jan 20 • 23