InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Abstract
InfiniDepth represents depth as neural implicit fields using a local implicit decoder, enabling continuous 2D coordinate querying for arbitrary-resolution depth estimation and superior performance in fine-detail regions.
Existing depth estimation methods are fundamentally limited to predicting depth on discrete image grids. Such representations restrict their scalability to arbitrary output resolutions and hinder the geometric detail recovery. This paper introduces InfiniDepth, which represents depth as neural implicit fields. Through a simple yet effective local implicit decoder, we can query depth at continuous 2D coordinates, enabling arbitrary-resolution and fine-grained depth estimation. To better assess our method's capabilities, we curate a high-quality 4K synthetic benchmark from five different games, spanning diverse scenes with rich geometric and appearance details. Extensive experiments demonstrate that InfiniDepth achieves state-of-the-art performance on both synthetic and real-world benchmarks across relative and metric depth estimation tasks, particularly excelling in fine-detail regions. It also benefits the task of novel view synthesis under large viewpoint shifts, producing high-quality results with fewer holes and artifacts.
Community
Depth Beyond Pixels ๐
We Introduce InfiniDepth โ casting monocular depth estimation as a neural implicit field.
๐ Arbitrary-Resolution
๐ Accurate Metric Depth
๐ท Single-View NVS under large viewpoints shifts
Arxiv: https://arxiv.org/abs/2601.03252
page: https://zju3dv.github.io/InfiniDepth
arXiv explained breakdown of this paper ๐ https://arxivexplained.com/papers/infinidepth-arbitrary-resolution-and-fine-grained-depth-estimation-with-neural-implicit-fields
๐๐๐
๐ฎ
๐
๐
๐
InfiniDepth: Main Results and Key Findings
Overview
InfiniDepth introduces a revolutionary approach to monocular depth estimation by representing depth as neural implicit fields rather than discrete grids. This enables arbitrary-resolution and fine-grained depth prediction, addressing fundamental limitations of existing methods.
Key Innovations and Results
1. Neural Implicit Field Representation
Figure 1 showcases InfiniDepth's three main capabilities:
- (a) Arbitrary-resolution depth estimation - can query depth at any continuous coordinate
- (b) Fine-grained point clouds with geometric detail preservation
- (c) Enhanced novel view synthesis with fewer holes and artifacts
The core insight is modeling depth as a continuous function:
d_I(x,y) = N_ฮธ(I,(x,y))
where any 2D coordinate can be mapped to a depth value, breaking free from grid constraints.
2. Multi-Scale Local Implicit Decoder
Figure 2 illustrates the two-module architecture:
Feature Query (a):
- Extracts multi-scale features from ViT encoder layers
- Constructs feature pyramid with different spatial resolutions
- Uses bilinear interpolation to query features at continuous coordinates
Depth Decoding (b):
- Hierarchically fuses features from high-to-low resolution
- Employs residual gated fusion blocks
- Predicts depth through lightweight MLP head
3. Infinite Depth Query Strategy
Figure 3 demonstrates a key breakthrough: traditional per-pixel depth prediction creates density imbalance due to perspective projection and surface orientation effects. InfiniDepth's adaptive query strategy:
- Computes adaptive weights:
w(x,y) = d_I(x,y)ยฒ / |n(x,y)ยทv(x,y)| + ฮต - Allocates sub-pixel query budgets proportionally to 3D surface area
- Generates uniformly distributed 3D points on object surfaces
4. High-Quality Internal Geometry
Figure 4 shows that the model learns high-quality internal geometry, with normal maps computed through autograd revealing detailed surface structure.
Quantitative Results
Synthetic Benchmark (Synth4K)
The paper introduces Synth4K, a new 4K synthetic benchmark from five games with diverse scenes and geometric details.
Relative Depth Estimation (Table 1):
- InfiniDepth achieves state-of-the-art performance across all metrics
- Particularly strong in high-frequency (HF) masked regions
- ฮดโ accuracy significantly outperforms baselines
Metric Depth Estimation (Table 2):
- Combined with sparse depth inputs ("Ours-Metric")
- Superior performance on stricter thresholds (ฮดโ.โโ, ฮดโ.โโ, ฮดโ.โโ)
Real-World Benchmarks
Relative Depth (Table 3):
- Competitive performance on KITTI, ETH3D, NYUv2, ScanNet, DIODE
- On par with current SOTA methods
Metric Depth (Table 4):
- Clear improvements over existing metric depth methods
- Outperforms Marigold-DC, Omni-DC, PriorDA, PromptDA
Qualitative Comparisons
Depth Map Quality
Figure 5 shows:
- First two rows: Synth4K predictions with superior detail preservation
- Bottom row: Real-world data with low-resolution input
- Highlighted boxes demonstrate fine-detail recovery capabilities
Metric Depth Results
Figure 6 highlights geometric detail recovery in high-frequency regions, showing cleaner edges and better preservation of fine structures.
Novel View Synthesis
Figure 8 demonstrates superior novel view synthesis under large viewpoint shifts:
- InfiniDepth produces complete, stable results
- ADGaussian baseline shows noticeable geometric holes and artifacts
- The infinite depth query strategy ensures uniform point distribution
Key Ablation Studies
Depth Representation Effectiveness (Table 5)
- Neural implicit fields significantly outperform discrete grid representations
- Gains more pronounced in metric depth estimation with sparse inputs
Multi-Scale Feature Query
- Multi-scale mechanism brings substantial improvements
- Single-scale baseline performs considerably worse
Computational Efficiency (Table 6)
- Decoder has the lowest parameter count among compared methods
- Competitive computational efficiency despite superior detail preservation
Impact and Significance
- Resolution Independence: Breaks free from training resolution constraints
- Fine Detail Preservation: Excels in geometrically complex regions
- Multi-Task Versatility: Effective for both relative and metric depth estimation
- Downstream Applications: Benefits novel view synthesis, 3D reconstruction pipelines
- Benchmark Contribution: Synth4K enables better evaluation of high-resolution depth estimation
Limitations and Future Work
- No explicit temporal consistency for video applications
- Future work: extend to multi-view settings for improved temporal stability and 3D consistency
InfiniDepth represents a fundamental shift in depth estimation, moving from discrete grid representations to continuous neural implicit fields, enabling unprecedented resolution scalability and detail preservation capabilities.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction (2025)
- Depth Anything 3: Recovering the Visual Space from Any Views (2025)
- CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model. (2025)
- 360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images (2026)
- GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection (2025)
- Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting (2025)
- Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper






