Update model card for InfinityCC: Spherical Leech Quantization

#5
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +43 -27
README.md CHANGED
@@ -1,46 +1,62 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
 
5
  ---
6
- # Infinity ∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
7
 
8
- <div align="center">
9
 
10
- [![demo platform](https://img.shields.io/badge/Play%20with%20Infinity%21-Infinity%20demo%20platform-lightblue)](https://opensource.bytedance.com/gmpt/t2i/invite)&nbsp;
11
- [![arXiv](https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages)](https://foundationvision.github.io/infinity.project/)&nbsp;
12
- [![arXiv](https://img.shields.io/badge/arXiv%20paper-2412.04431-b31b1b.svg)](https://arxiv.org/abs/2412.04431)&nbsp;
13
- [![huggingface weights](https://img.shields.io/badge/%F0%9F%A4%97%20Weights-FoundationVision/Infinity-yellow)](https://huggingface.co/FoundationVision/infinity)&nbsp;
14
- [![code](https://img.shields.io/badge/%F0%9F%A4%96%20Code-FoundationVision/Infinity-green)](https://github.com/FoundationVision/Infinity)&nbsp;
15
 
16
- </div>
17
- <p align="center" style="font-size: larger;">
18
- <a href="https://arxiv.org/abs/2412.04431">Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis</a>
19
- </p>
20
 
 
 
21
 
 
22
 
 
 
23
 
 
24
 
 
 
 
 
 
 
25
 
 
 
 
 
 
 
 
26
 
27
- ## 📖 Introduction
28
- We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution and photorealistic images. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction. Theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024×1024 image in 0.8 seconds, making it 2.6× faster than SD3-Medium and establishing it as the fastest text-to-image model.
29
 
30
- ## 📌 Note
31
- This repo is used for hosting Infinity's checkpoints. For more details, please refer to [![code](https://img.shields.io/badge/%F0%9F%A4%96%20Code-FoundationVision/Infinity-green)](https://github.com/FoundationVision/Infinity)&nbsp;
 
 
 
 
32
 
33
- ## 📖 Citation
34
  If our work assists your research, feel free to give us a star ⭐ or cite us using:
35
 
36
- ```
37
- @misc{han2024infinityscalingbitwiseautoregressive,
38
- title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis},
39
- author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
40
- year={2024},
41
- eprint={2412.04431},
42
- archivePrefix={arXiv},
43
- primaryClass={cs.CV},
44
- url={https://arxiv.org/abs/2412.04431},
45
  }
46
- ```
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ pipeline_tag: text-to-image
6
  ---
 
7
 
8
+ # InfinityCC: Spherical Leech Quantization for Visual Tokenization and Generation
9
 
10
+ This repository hosts **InfinityCC**, a working example showcasing the power of [Non-Parametric Quantization (NPQ)](https://cs.stanford.edu/~yzz/npq/) for ImageNet-1k class-conditioned image generation.
 
 
 
 
11
 
12
+ The model is based on the paper: [**Spherical Leech Quantization for Visual Tokenization and Generation**](https://huggingface.co/papers/2512.14697)
13
+ Yue Zhao, Hanwen Jiang, Zhenlin Xu, Chutong Yang, Ehsan Adeli, Philipp Krähenbühl.
 
 
14
 
15
+ Project Page: [https://cs.stanford.edu/~yzz/npq/](https://cs.stanford.edu/~yzz/npq/)
16
+ Code: [https://github.com/zhaoyue-zephyrus/InfinityCC](https://github.com/zhaoyue-zephyrus/InfinityCC)
17
 
18
+ <img src="https://github.com/zhaoyue-zephyrus/InfinityCC/raw/main/assets/npq.png" width="640">
19
 
20
+ ## Introduction
21
+ In this work, we explore Spherical Leech Quantization ($\Lambda_{24}$-SQ), a non-parametric quantization method rooted in lattice coding. This approach simplifies the training recipe and improves the reconstruction-compression tradeoff, thanks to its high symmetry and even distribution on the hypersphere. It has demonstrated better reconstruction quality than prior art in image tokenization and compression tasks, with improvements extending to state-of-the-art auto-regressive image generation frameworks. InfinityCC serves as a practical demonstration of this powerful quantization technique for visual generation.
22
 
23
+ ## Installation
24
 
25
+ We use [uv](https://docs.astral.sh/uv/) to manage all dependencies.
26
+
27
+ ```bash
28
+ uv sync
29
+ source .venv/bin/activate
30
+ ```
31
 
32
+ To evaluate ImageNet using the ADM evaluator, run the following command lines:
33
+ ```bash
34
+ mkdir third_party/ && cd third_party/
35
+ git clone https://${GIT_TOKEN}@github.com/openai/guided-diffusion.git
36
+ cd guided-diffusion/evaluations
37
+ wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
38
+ ```
39
 
40
+ ## Results
 
41
 
42
+ ### InfinityCC Performance
43
+ | model | Resolution | #layers | Tokenizer (HF weights🤗) | VAR Model (HF weights🤗) | FID |
44
+ |:----------:|:-----:|:--------:|:---------:|:-----------------------------------------------------------------------------------:|:----:|
45
+ | InfinityCC | 256 | 12 | [bitvae_l24_xl](https://huggingface.co/zhaoyue-zephyrus/InfinityCC_L24SQ/tree/main/tokenization/infinity_l24_stage1_xl) | [infinitycc_12layer_weights](https://huggingface.co/zhaoyue-zephyrus/InfinityCC_L24SQ/tree/main/generation/infinitycc_12layer_256x256_l24_xl_ep50_cce_zloss_improved_schedule_dion) | 6.66 |
46
+ | InfinityCC | 256 | 24 | [bitvae_l24_xl_vf](https://huggingface.co/zhaoyue-zephyrus/InfinityCC_L24SQ/tree/main/tokenization/infinity_l24_stage1_xl_vf) | [infinitycc_24layer_weights](https://huggingface.co/zhaoyue-zephyrus/InfinityCC_L24SQ/tree/main/generation/infinitycc_24layer_256x256_l24_xl_vf_ep350_cce_zloss_improved_schedule_dion_unsharedaln) | 2.21 |
47
+ | InfinityCC-2B | 256 | 32 | [TBD]() | [TBD]() | 1.80 |
48
 
49
+ ## Citation
50
  If our work assists your research, feel free to give us a star ⭐ or cite us using:
51
 
52
+ ```bibtex
53
+ @article{zhao2025spherical,
54
+ title={Spherical Leech Quantization for Visual Tokenization and Generation},
55
+ author={Zhao, Yue and Jiang, Hanwen and Xu, Zhenlin and Yang, Chutong and Adeli, Ehsan and Krähenbühl, Philipp},
56
+ journal={arXiv preprint arXiv:2512.14697},
57
+ year={2025}
 
 
 
58
  }
59
+ ```
60
+
61
+ ## License
62
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.