Wenxuan Huang's picture

3 7 3

Wenxuan Huang

Osilly

·

Osilly

AI & ML interests

None yet

Recent Activity

authored a paper about 2 months ago

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

upvoted a paper about 2 months ago

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

commented on a paper about 2 months ago

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

View all activity

Organizations

authored a paper about 2 months ago

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

Paper • 2511.01618 • Published Nov 3 • 10

authored 5 papers 3 months ago

CompBench: Benchmarking Complex Instruction-guided Image Editing

Paper • 2505.12200 • Published May 18

Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback

Paper • 2507.20766 • Published Jul 28 • 1

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

Paper • 2509.24709 • Published Sep 29 • 6

Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models

Paper • 2510.01304 • Published Oct 1 • 10

Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?

Paper • 2510.06036 • Published Oct 7 • 6

authored a paper 4 months ago

Interleaving Reasoning for Better Text-to-Image Generation

Paper • 2509.06945 • Published Sep 8 • 14

authored 2 papers 8 months ago

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Paper • 2412.00876 • Published Dec 1, 2024

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published Mar 9 • 31

authored a paper 9 months ago

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10 • 46