nyu-visionx/scale-rae-data
Updated
•
2.34k
•
1
None defined yet.
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding