ERNIE-Image-INT8

ERNIE-Image-INT8 is a publishable INT8 derivative of Baidu/ERNIE-Image, prepared for local deployment, packaging, and reproducible benchmarking. The default release profile prioritizes transformer INT8 quantization first, while text_encoder and pe may remain in bfloat16 when quality checks show that full INT8 introduces unacceptable degradation.

What Is Included

  • Diffusers-compatible model folder layout.
  • Component-wise precision manifest and quantization metadata.

Precision Matrix

Component Backend Precision Enabled
transformer quanto int8 True
text_encoder none bfloat16 False
pe none bfloat16 False

Recommended Runtime

  • NVIDIA GPU with 24 GB+ VRAM for practical generation.
  • CPU is supported only for loading validation, metadata inspection, and smoke tests.
  • Recommended image sizes follow the original ERNIE-Image guidance: 1024x1024, 848x1264, 1264x848, 1200x896.

Quick Start

import torch
from diffusers import ErnieImagePipeline

pipe = ErnieImagePipeline.from_pretrained(
    "ixim/ERNIE-Image-INT8",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A premium event poster with readable bilingual typography and strong layout hierarchy.",
    width=848,
    height=1264,
    num_inference_steps=50,
    guidance_scale=4.0,
    use_pe=True,
).images[0]

image.save("output.png")

Benchmark Snapshot

Benchmark context: 7 prompt(s), seed=42, prompts=zh_portrait_studio_east_asian, zh_poster_dense_text, zh_infographic_wide, zh_browser_ui_article, en_storyboard_dialogue, zh_sticker_grid, en_backlit_street_photo. Primary comparison covers transformer-int8 + pe-bf16 + use_pe=true, transformer-int8 + pe-int8 + use_pe=true, transformer-int8 + use_pe=false; variant-specific steps / guidance_scale / use_pe are listed in the tables below. Supplementary references cover ERNIE-Image-Turbo Reference. The pe-int8 row is a runtime-quantized benchmark variant used for comparison only, and does not change the packaged release precision matrix shown above. Peak VRAM reports the peak reserved CUDA memory of the current PyTorch process during each generation call.

Group Variant Prompt Count Avg Latency (ms) Avg Peak VRAM (MiB) Steps CFG Use PE
primary transformer-int8 + pe-bf16 + use_pe=true 7 78053 28516 50 4.0 True
primary transformer-int8 + pe-int8 + use_pe=true 7 81412 28721 50 4.0 True
primary transformer-int8 + use_pe=false 7 60287 28339 50 4.0 False
supplementary ERNIE-Image-Turbo Reference 7 32535 35255 8 1.0 True

Prompt-by-Prompt Comparison

zh_portrait_studio_east_asian

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true zh_portrait_studio_east_asian - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 70842 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true zh_portrait_studio_east_asian - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 76522 ms / 29172 MiB peak
transformer-int8 + use_pe=false zh_portrait_studio_east_asian - transformer-int8 + use_pe=false 50 4.0 False 60728 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference zh_portrait_studio_east_asian - ERNIE-Image-Turbo Reference 8 1.0 True 20393 ms / 35708 MiB peak

zh_poster_dense_text

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true zh_poster_dense_text - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 71753 ms / 27912 MiB peak
transformer-int8 + pe-int8 + use_pe=true zh_poster_dense_text - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 73674 ms / 28118 MiB peak
transformer-int8 + use_pe=false zh_poster_dense_text - transformer-int8 + use_pe=false 50 4.0 False 60753 ms / 27738 MiB peak
ERNIE-Image-Turbo Reference zh_poster_dense_text - ERNIE-Image-Turbo Reference 8 1.0 True 22736 ms / 34654 MiB peak

zh_infographic_wide

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true zh_infographic_wide - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 72318 ms / 27914 MiB peak
transformer-int8 + pe-int8 + use_pe=true zh_infographic_wide - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 80810 ms / 28120 MiB peak
transformer-int8 + use_pe=false zh_infographic_wide - transformer-int8 + use_pe=false 50 4.0 False 60691 ms / 27738 MiB peak
ERNIE-Image-Turbo Reference zh_infographic_wide - ERNIE-Image-Turbo Reference 8 1.0 True 22987 ms / 34654 MiB peak

zh_browser_ui_article

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true zh_browser_ui_article - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 74728 ms / 27916 MiB peak
transformer-int8 + pe-int8 + use_pe=true zh_browser_ui_article - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 76007 ms / 28120 MiB peak
transformer-int8 + use_pe=false zh_browser_ui_article - transformer-int8 + use_pe=false 50 4.0 False 61152 ms / 27738 MiB peak
ERNIE-Image-Turbo Reference zh_browser_ui_article - ERNIE-Image-Turbo Reference 8 1.0 True 21412 ms / 34654 MiB peak

en_storyboard_dialogue

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true en_storyboard_dialogue - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 93052 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true en_storyboard_dialogue - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 89542 ms / 29172 MiB peak
transformer-int8 + use_pe=false en_storyboard_dialogue - transformer-int8 + use_pe=false 50 4.0 False 59642 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference en_storyboard_dialogue - ERNIE-Image-Turbo Reference 8 1.0 True 55379 ms / 35706 MiB peak

zh_sticker_grid

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true zh_sticker_grid - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 92627 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true zh_sticker_grid - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 99956 ms / 29172 MiB peak
transformer-int8 + use_pe=false zh_sticker_grid - transformer-int8 + use_pe=false 50 4.0 False 59401 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference zh_sticker_grid - ERNIE-Image-Turbo Reference 8 1.0 True 62985 ms / 35706 MiB peak

en_backlit_street_photo

Variant Preview Steps CFG Use PE Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true en_backlit_street_photo - transformer-int8 + pe-bf16 + use_pe=true 50 4.0 True 71049 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true en_backlit_street_photo - transformer-int8 + pe-int8 + use_pe=true 50 4.0 True 73374 ms / 29172 MiB peak
transformer-int8 + use_pe=false en_backlit_street_photo - transformer-int8 + use_pe=false 50 4.0 False 59643 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference en_backlit_street_photo - ERNIE-Image-Turbo Reference 8 1.0 True 21852 ms / 35706 MiB peak

Example Prompt Set

See example_prompts.json for the curated prompt suite used during packaging and regression checks. When scripts/build_release.py receives an --examples-dir benchmark folder, the prompt-grouped benchmark tables above also render preview images from those outputs automatically.

Intended Use

  • Local image generation tools and controlled packaging workflows.
  • Quantization research on large open-weight text-to-image models.
  • Internal demo services where image history, prompt reproducibility, and artifact packaging matter.

Limitations

  • Full CPU generation is not a practical primary target for this release.
  • Text rendering, dense layouts, and long structured prompts should always be rechecked after quantization.
  • Experimental all-INT8 variants can degrade typography, object counting, and layout adherence.

License

This release inherits the Apache-2.0 terms of the base model. Review the included LICENSE and make sure your downstream usage also complies with the original ERNIE-Image terms and any third-party dependencies you add around it.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ixim/ERNIE-Image-INT8

Quantized
(4)
this model