🎬 Wan-NVFP4-4Steps Models

NVFP4 Quantization-Aware Step Distillation for Blackwell Architecture

GitHub HuggingFace

πŸ“‹ Table of Contents

✨ Features

  • ⚑ 4-Step Inference: Dramatically accelerated end-to-end generation approaching real-time performance (tested on RTX 5090 single GPU)
  • 🎯 NVFP4 Quantization: Reduced memory and bandwidth usage, optimized for Blackwell architecture
  • πŸ”§ LightX2V Integration: Optimal performance and stability on the official framework
  • πŸš€ High-Quality Generation: Maintains Wan2.1's superior video quality while achieving unprecedented speed

πŸš€ Quick Start

# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .

# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel

MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
  -Cbuild-dir=build . \
  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
  --verbose --color=always --no-build-isolation

pip install dist/*whl --force-reinstall --no-deps

# 3. Run inference
cd examples/wan
python wan_i2v_nvfp4.py   # Image-to-Video
python wan_t2v_nvfp4.py   # Text-to-Video

🎬 Generation Results

"A cinematic, hyper-realistic 3D animation, in the somber and beautiful style of Sekiro: Shadows Die Twice. In a vast field of silvery-white pampas grass, under a luminous full moon, the shinobi Wolf stands ready for a final duel..."

Input Image Wan2.1-I2V-14B-480P wan2.1_i2v_480p_nvfp4_lightx2v_4step

"ι«˜ε―Ήζ―”εΊ¦οΌŒι«˜ι₯±ε’ŒεΊ¦οΌŒηŸ­θΎΉζž„ε›ΎοΌŒζ—₯θ½οΌŒδΈ­η„¦θ·οΌŒζŸ”ε…‰οΌŒθƒŒε…‰οΌŒζš–θ‰²θ°ƒοΌŒθΎΉηΌ˜ε…‰οΌŒδΈ­θΏ‘ζ™―οΌŒζ—₯ε…‰οΌŒζ™΄ε€©ε…‰οΌŒδΈ€δ½ε€–ε›½η™½δΊΊε₯³ζ€§ηš„θΏ‘ζ™―,ε₯ΉθΊ«η©Ώι»„θ‰²ζ Όε­θΏžθ‘£θ£™οΌŒζˆ΄η€θ€³ηŽ―γ€‚ιšη€δ»°ζ‹ι•œε€΄ηš„δΈŠε‡οΌŒε₯³ε­ζŠ¬θ΅·ε€΄ζ₯οΌŒηœΌη›ι‡Œε«η€ζ³ͺζ°΄οΌŒηœ‹η€ε‰ζ–Ήθ―΄η€θ―..."

Wan2.1-T2V-1.3B wan2.1_t2v_1_3b_nvfp4_lightx2v_4step

⚑ Performance Comparison

Test Environment: RTX 5090 Single GPU | LightX2V Framework

πŸ“Έ Image-to-Video (I2V-14B-480P)

Metric Original Model Optimized Model Speedup
Single-step Denoising 12.10s 3.40s 3.5x
End-to-End 498.90s 17.65s 28x

🎬 Text-to-Video (T2V-1.3B-480P)

Metric Original Model Optimized Model Speedup
Single-step Denoising 2.00s 0.70s 2.9x
End-to-End 83.50s 6.54s 12.8x

⚠️ Notes

System Requirements

  • Required Hardware: NVIDIA RTX 50-series GPUs (RTX 5090/5080/5070/5060) or other Blackwell architecture GPUs

Dependencies

  • Prepare T5 / CLIP / VAE components yourself (same as Wan2.x structure)

Performance Tips

  • Use Blackwell + NVFP4 for best performance
  • Enable CPU offload for GPUs with limited memory

🀝 Community


If you find this project helpful, please give us a ⭐ on GitHub

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lightx2v/Wan-NVFP4

Finetuned
(16)
this model

Collection including lightx2v/Wan-NVFP4