26 87 17

Min-Hung Chen

cmhungsteve

https://minhungchen.netlify.app/

AI & ML interests

Multimodal AI, Transfer Learning, Unsupervised Learning, Video Understanding, Vision Transformer, Computer Vision, Deep Learning

Recent Activity

upvoted a paper 2 days ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

upvoted a paper 10 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

upvoted a collection 15 days ago

Cosmos3

View all activity

Organizations

upvoted a paper 2 days ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

Paper • 2606.25621 • Published 4 days ago • 13

upvoted a paper 10 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 12 days ago • 63

upvoted a collection 15 days ago

Cosmos3

Collection

Omnimodal World Models for Physical AI • 16 items • Updated 1 day ago • 132

authored a paper 15 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 17 days ago • 108

upvoted a paper 15 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 17 days ago • 108

submitted a paper to Daily Papers 15 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 17 days ago • 108

authored 3 papers 22 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published about 1 month ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 24 days ago • 16

upvoted 3 papers 22 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published about 1 month ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 24 days ago • 16

New activity in nvidia/4D-RGPT-8B 25 days ago

fix links

#1 opened 25 days ago by

cmhungsteve

liked a model 25 days ago

nvidia/4D-RGPT-8B

Video-Text-to-Text • Updated 25 days ago • 253 • 15

upvoted a paper 29 days ago

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published about 1 month ago • 60

upvoted a paper about 1 month ago

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Paper • 2605.28774 • Published May 27 • 93

upvoted an article about 1 month ago

Article

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

nvidia

•

May 18

• 21

liked a dataset about 1 month ago

nvidia/PhysicalAI-VANTAGE-Bench

Viewer • Updated about 21 hours ago • 7.64k • 5.02k • 14

liked a model about 2 months ago

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

Any-to-Any • 33B • Updated May 8 • 752k • 360

New activity in MINT-SJTU/RoboFAC-dataset 2 months ago

License for RoboFAC?

#6 opened 2 months ago by

cmhungsteve

Min-Hung Chen

AI & ML interests

Recent Activity

Organizations

cmhungsteve's activity

fix links

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

License for RoboFAC?