ERNIE-Image-INT8

ERNIE-Image-INT8 is a publishable INT8 derivative of Baidu/ERNIE-Image, prepared for local deployment, packaging, and reproducible benchmarking. The default release profile prioritizes transformer INT8 quantization first, while text_encoder and pe may remain in bfloat16 when quality checks show that full INT8 introduces unacceptable degradation.

What Is Included

Diffusers-compatible model folder layout.
Component-wise precision manifest and quantization metadata.

Precision Matrix

Component	Backend	Precision	Enabled
transformer	quanto	int8	True
text_encoder	none	bfloat16	False
pe	none	bfloat16	False

Recommended Runtime

NVIDIA GPU with 24 GB+ VRAM for practical generation.
CPU is supported only for loading validation, metadata inspection, and smoke tests.
Recommended image sizes follow the original ERNIE-Image guidance: 1024x1024, 848x1264, 1264x848, 1200x896.

Quick Start

import torch
from diffusers import ErnieImagePipeline

pipe = ErnieImagePipeline.from_pretrained(
    "ixim/ERNIE-Image-INT8",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="A premium event poster with readable bilingual typography and strong layout hierarchy.",
    width=848,
    height=1264,
    num_inference_steps=50,
    guidance_scale=4.0,
    use_pe=True,
).images[0]

image.save("output.png")

Benchmark Snapshot

Benchmark context: 7 prompt(s), seed=42, prompts=zh_portrait_studio_east_asian, zh_poster_dense_text, zh_infographic_wide, zh_browser_ui_article, en_storyboard_dialogue, zh_sticker_grid, en_backlit_street_photo. Primary comparison covers transformer-int8 + pe-bf16 + use_pe=true, transformer-int8 + pe-int8 + use_pe=true, transformer-int8 + use_pe=false; variant-specific steps / guidance_scale / use_pe are listed in the tables below. Supplementary references cover ERNIE-Image-Turbo Reference. The pe-int8 row is a runtime-quantized benchmark variant used for comparison only, and does not change the packaged release precision matrix shown above. Peak VRAM reports the peak reserved CUDA memory of the current PyTorch process during each generation call.

Group	Variant	Prompt Count	Avg Latency (ms)	Avg Peak VRAM (MiB)	Steps	CFG	Use PE
primary	transformer-int8 + pe-bf16 + use_pe=true	7	78053	28516	50	4.0	True
primary	transformer-int8 + pe-int8 + use_pe=true	7	81412	28721	50	4.0	True
primary	transformer-int8 + use_pe=false	7	60287	28339	50	4.0	False
supplementary	ERNIE-Image-Turbo Reference	7	32535	35255	8	1.0	True

Prompt-by-Prompt Comparison

zh_portrait_studio_east_asian

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	70842 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	76522 ms / 29172 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	60728 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	20393 ms / 35708 MiB peak

zh_poster_dense_text

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	71753 ms / 27912 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	73674 ms / 28118 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	60753 ms / 27738 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	22736 ms / 34654 MiB peak

zh_infographic_wide

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	72318 ms / 27914 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	80810 ms / 28120 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	60691 ms / 27738 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	22987 ms / 34654 MiB peak

zh_browser_ui_article

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	74728 ms / 27916 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	76007 ms / 28120 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	61152 ms / 27738 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	21412 ms / 34654 MiB peak

en_storyboard_dialogue

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	93052 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	89542 ms / 29172 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	59642 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	55379 ms / 35706 MiB peak

zh_sticker_grid

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	92627 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	99956 ms / 29172 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	59401 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	62985 ms / 35706 MiB peak

en_backlit_street_photo

Variant	Steps	CFG	Use PE	Latency / Peak VRAM
transformer-int8 + pe-bf16 + use_pe=true	50	4.0	True	71049 ms / 28968 MiB peak
transformer-int8 + pe-int8 + use_pe=true	50	4.0	True	73374 ms / 29172 MiB peak
transformer-int8 + use_pe=false	50	4.0	False	59643 ms / 28790 MiB peak
ERNIE-Image-Turbo Reference	8	1.0	True	21852 ms / 35706 MiB peak

Example Prompt Set

See example_prompts.json for the curated prompt suite used during packaging and regression checks. When scripts/build_release.py receives an --examples-dir benchmark folder, the prompt-grouped benchmark tables above also render preview images from those outputs automatically.

Intended Use

Local image generation tools and controlled packaging workflows.
Quantization research on large open-weight text-to-image models.
Internal demo services where image history, prompt reproducibility, and artifact packaging matter.

Limitations

Full CPU generation is not a practical primary target for this release.
Text rendering, dense layouts, and long structured prompts should always be rechecked after quantization.
Experimental all-INT8 variants can degrade typography, object counting, and layout adherence.

License

This release inherits the Apache-2.0 terms of the base model. Review the included LICENSE and make sure your downstream usage also complies with the original ERNIE-Image terms and any third-party dependencies you add around it.

Downloads last month: 8

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ixim/ERNIE-Image-INT8

Base model

baidu/ERNIE-Image

Quantized

(8)

this model