Add pipeline tag, license metadata and improve model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +13 -52
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  <div align="center">
2
  <img src="assets/teaser.png" width="100%">
3
 
@@ -9,7 +14,7 @@ Robbyant Team
9
 
10
  <div align="center">
11
 
12
- [![Paper](https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv)](https://arxiv.org/abs/2604.14141)
13
  [![PDF](https://img.shields.io/static/v1?label=Paper&message=PDF&color=red&logo=adobeacrobatreader)](lingbot-map_paper.pdf)
14
  [![Project](https://img.shields.io/badge/Project-Website-blue)](https://technology.robbyant.com/lingbot-map)
15
  [![HuggingFace](https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=orange)](https://huggingface.co/robbyant/lingbot-map)
@@ -24,8 +29,9 @@ https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab
24
 
25
  ### ๐Ÿ—บ๏ธ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ๐Ÿ—๏ธ๐ŸŒ
26
 
27
- LingBot-Map has focused on:
28
 
 
29
  - **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
30
  - **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518ร—378 resolution over long sequences exceeding 10,000 frames.
31
  - **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
@@ -49,8 +55,6 @@ conda activate lingbot-map
49
  pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
50
  ```
51
 
52
- > For other CUDA versions, see [PyTorch Get Started](https://pytorch.org/get-started/locally/).
53
-
54
  **3. Install lingbot-map**
55
 
56
  ```bash
@@ -66,21 +70,6 @@ FlashInfer provides paged KV cache attention for efficient streaming inference:
66
  pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
67
  ```
68
 
69
- > For other CUDA/PyTorch combinations, see [FlashInfer installation](https://docs.flashinfer.ai/installation.html).
70
- > If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via `--use_sdpa`.
71
-
72
- **5. Visualization dependencies (optional)**
73
-
74
- ```bash
75
- pip install -e ".[vis]"
76
- ```
77
-
78
- # ๐Ÿ“ฆ Model Download
79
-
80
- | Model Name | Huggingface Repository | ModelScope Repository | Description |
81
- | :--- | :--- | :--- | :--- |
82
- | lingbot-map | [robbyant/lingbot-map](https://huggingface.co/robbyant/lingbot-map) | [Robbyant/lingbot-map](https://www.modelscope.cn/models/Robbyant/lingbot-map) | Base model checkpoint (4.63 GB) |
83
-
84
  # ๐ŸŽฌ Demo
85
 
86
  ### Streaming Inference from Images
@@ -99,37 +88,23 @@ python demo.py --model_path /path/to/checkpoint.pt \
99
 
100
  ### Streaming with Keyframe Interval
101
 
102
- Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences
103
- which excesses 320 frames.
104
 
105
  ```bash
106
  python demo.py --model_path /path/to/checkpoint.pt \
107
  --image_folder /path/to/images/ --keyframe_interval 6
108
  ```
109
 
110
- ### Windowed Inference (for long sequences, >3000 frames)
111
- ```bash
112
- python demo.py --model_path /path/to/checkpoint.pt \
113
- --video_path video.mp4 --fps 10 \
114
- --mode windowed --window_size 64
115
- ```
116
-
117
-
118
  ### Sky Masking
119
 
120
- Sky masking uses an ONNX sky segmentation model to filter out sky points from the reconstructed point cloud, which improves visualization quality for outdoor scenes.
121
 
122
  **Setup:**
123
 
124
  ```bash
125
- # Install onnxruntime (required)
126
- pip install onnxruntime # CPU
127
- # or
128
- pip install onnxruntime-gpu # GPU (faster for large image sets)
129
  ```
130
 
131
- The sky segmentation model (`skyseg.onnx`) will be automatically downloaded from [HuggingFace](https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx) on first use.
132
-
133
  **Usage:**
134
 
135
  ```bash
@@ -137,15 +112,6 @@ python demo.py --model_path /path/to/checkpoint.pt \
137
  --image_folder /path/to/images/ --mask_sky
138
  ```
139
 
140
- Sky masks are cached in `<image_folder>_sky_masks/` so subsequent runs skip regeneration.
141
-
142
- ### Without FlashInfer (SDPA fallback)
143
-
144
- ```bash
145
- python demo.py --model_path /path/to/checkpoint.pt \
146
- --image_folder /path/to/images/ --use_sdpa
147
- ```
148
-
149
  # ๐Ÿ“œ License
150
 
151
  This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
@@ -163,12 +129,7 @@ This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt
163
 
164
  # โœจ Acknowledgments
165
 
166
- We thank Shangzhan Zhang, Jianyuan Wang, Yudong Jin, Christian Rupprecht, and Xun Cao for their helpful discussions and support.
167
-
168
- This work builds upon several excellent open-source projects:
169
-
170
  - [VGGT](https://github.com/facebookresearch/vggt)
171
  - [DINOv2](https://github.com/facebookresearch/dinov2)
172
- - [Flashinfer](https://github.com/flashinfer-ai/flashinfer)
173
-
174
- ---
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-3d
4
+ ---
5
+
6
  <div align="center">
7
  <img src="assets/teaser.png" width="100%">
8
 
 
14
 
15
  <div align="center">
16
 
17
+ [![Paper](https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv)](https://huggingface.co/papers/2604.14141)
18
  [![PDF](https://img.shields.io/static/v1?label=Paper&message=PDF&color=red&logo=adobeacrobatreader)](lingbot-map_paper.pdf)
19
  [![Project](https://img.shields.io/badge/Project-Website-blue)](https://technology.robbyant.com/lingbot-map)
20
  [![HuggingFace](https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=orange)](https://huggingface.co/robbyant/lingbot-map)
 
29
 
30
  ### ๐Ÿ—บ๏ธ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ๐Ÿ—๏ธ๐ŸŒ
31
 
32
+ LingBot-Map is a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture.
33
 
34
+ Key features include:
35
  - **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
36
  - **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518ร—378 resolution over long sequences exceeding 10,000 frames.
37
  - **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
 
55
  pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
56
  ```
57
 
 
 
58
  **3. Install lingbot-map**
59
 
60
  ```bash
 
70
  pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
71
  ```
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  # ๐ŸŽฌ Demo
74
 
75
  ### Streaming Inference from Images
 
88
 
89
  ### Streaming with Keyframe Interval
90
 
91
+ Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe.
 
92
 
93
  ```bash
94
  python demo.py --model_path /path/to/checkpoint.pt \
95
  --image_folder /path/to/images/ --keyframe_interval 6
96
  ```
97
 
 
 
 
 
 
 
 
 
98
  ### Sky Masking
99
 
100
+ Sky masking filters out sky points from the reconstructed point cloud.
101
 
102
  **Setup:**
103
 
104
  ```bash
105
+ pip install onnxruntime
 
 
 
106
  ```
107
 
 
 
108
  **Usage:**
109
 
110
  ```bash
 
112
  --image_folder /path/to/images/ --mask_sky
113
  ```
114
 
 
 
 
 
 
 
 
 
 
115
  # ๐Ÿ“œ License
116
 
117
  This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
 
129
 
130
  # โœจ Acknowledgments
131
 
132
+ This work builds upon several open-source projects:
 
 
 
133
  - [VGGT](https://github.com/facebookresearch/vggt)
134
  - [DINOv2](https://github.com/facebookresearch/dinov2)
135
+ - [Flashinfer](https://github.com/flashinfer-ai/flashinfer)