Papers
arxiv:2512.14614

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Published on Dec 16
· Submitted by
taesiri
on Dec 17
#2 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

WorldPlay is a streaming video diffusion model that achieves real-time, interactive world modeling with long-term geometric consistency by using a Dual Action Representation, Reconstituted Context Memory, and Context Forcing.

AI-generated summary

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key innovations. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We also propose Context Forcing, a novel distillation method designed for memory-aware model. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, WorldPlay generates long-horizon streaming 720p video at 24 FPS with superior consistency, comparing favorably with existing techniques and showing strong generalization across diverse scenes. Project page and online demo can be found: https://3d-models.hunyuan.tencent.com/world/ and https://3d.hunyuan.tencent.com/sceneTo3D.

Community

Paper submitter

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key innovations. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We also propose Context Forcing, a novel distillation method designed for memory-aware model. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, WorldPlay generates long-horizon streaming 720p video at 24 FPS with superior consistency, comparing favorably with existing techniques and showing strong generalization across diverse scenes.

Create a YouTube thumbnail featuring the unique AI boy model (full-body young Indian male, 18-22 years old, casual modern outfit, friendly and approachable expression) in a bright, clean, and minimal indoor environment. Include cinematic soft lighting with subtle shadows and slight background blur to make the text pop. Add bold, readable text: "My Life Journey" in vibrant colors (yellow, orange, white) at the top or bottom. Include subtle props like smartphone, notebook, coffee mug, or backpack. Use a dynamic and eye-catching composition. Ultra-detailed, photorealistic, premium editorial style, vibrant colors, 8K resolution, perfect for YouTube and Instagram thumbnails. Make the design modern, clean, and professional-looking.

Thumbnail Size: 1280x720 pixels (16:9 ratio, YouTube standard)

image (18)

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.14614 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 4