Papers
arxiv:2601.03252

InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

Published on Jan 6
ยท Submitted by
Haotong Lin
on Jan 7
#1 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

InfiniDepth represents depth as neural implicit fields using a local implicit decoder, enabling continuous 2D coordinate querying for arbitrary-resolution depth estimation and superior performance in fine-detail regions.

AI-generated summary

Existing depth estimation methods are fundamentally limited to predicting depth on discrete image grids. Such representations restrict their scalability to arbitrary output resolutions and hinder the geometric detail recovery. This paper introduces InfiniDepth, which represents depth as neural implicit fields. Through a simple yet effective local implicit decoder, we can query depth at continuous 2D coordinates, enabling arbitrary-resolution and fine-grained depth estimation. To better assess our method's capabilities, we curate a high-quality 4K synthetic benchmark from five different games, spanning diverse scenes with rich geometric and appearance details. Extensive experiments demonstrate that InfiniDepth achieves state-of-the-art performance on both synthetic and real-world benchmarks across relative and metric depth estimation tasks, particularly excelling in fine-detail regions. It also benefits the task of novel view synthesis under large viewpoint shifts, producing high-quality results with fewer holes and artifacts.

Community

Paper author Paper submitter

Depth Beyond Pixels ๐Ÿš€
We Introduce InfiniDepth โ€” casting monocular depth estimation as a neural implicit field.
๐Ÿ” Arbitrary-Resolution
๐Ÿ“ Accurate Metric Depth
๐Ÿ“ท Single-View NVS under large viewpoints shifts
Arxiv: https://arxiv.org/abs/2601.03252
page: https://zju3dv.github.io/InfiniDepth

ยท

๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

๐Ÿฎ

๐Ÿ‚

๐Ÿ‚

๐Ÿ‘

InfiniDepth: Main Results and Key Findings

Overview

InfiniDepth introduces a revolutionary approach to monocular depth estimation by representing depth as neural implicit fields rather than discrete grids. This enables arbitrary-resolution and fine-grained depth prediction, addressing fundamental limitations of existing methods.

Key Innovations and Results

1. Neural Implicit Field Representation

Figure 1: InfiniDepth capabilities

Figure 1 showcases InfiniDepth's three main capabilities:

  • (a) Arbitrary-resolution depth estimation - can query depth at any continuous coordinate
  • (b) Fine-grained point clouds with geometric detail preservation
  • (c) Enhanced novel view synthesis with fewer holes and artifacts

The core insight is modeling depth as a continuous function:

d_I(x,y) = N_ฮธ(I,(x,y))

where any 2D coordinate can be mapped to a depth value, breaking free from grid constraints.

2. Multi-Scale Local Implicit Decoder

Figure 2: Method architecture

Figure 2 illustrates the two-module architecture:

Feature Query (a):

  • Extracts multi-scale features from ViT encoder layers
  • Constructs feature pyramid with different spatial resolutions
  • Uses bilinear interpolation to query features at continuous coordinates

Depth Decoding (b):

  • Hierarchically fuses features from high-to-low resolution
  • Employs residual gated fusion blocks
  • Predicts depth through lightweight MLP head

3. Infinite Depth Query Strategy

Figure 3: Point distribution comparison

Figure 3 demonstrates a key breakthrough: traditional per-pixel depth prediction creates density imbalance due to perspective projection and surface orientation effects. InfiniDepth's adaptive query strategy:

  • Computes adaptive weights: w(x,y) = d_I(x,y)ยฒ / |n(x,y)ยทv(x,y)| + ฮต
  • Allocates sub-pixel query budgets proportionally to 3D surface area
  • Generates uniformly distributed 3D points on object surfaces

4. High-Quality Internal Geometry

Figure 4: Normal maps from implicit fields

Figure 4 shows that the model learns high-quality internal geometry, with normal maps computed through autograd revealing detailed surface structure.

Quantitative Results

Synthetic Benchmark (Synth4K)

The paper introduces Synth4K, a new 4K synthetic benchmark from five games with diverse scenes and geometric details.

Relative Depth Estimation (Table 1):

  • InfiniDepth achieves state-of-the-art performance across all metrics
  • Particularly strong in high-frequency (HF) masked regions
  • ฮดโ‚ accuracy significantly outperforms baselines

Metric Depth Estimation (Table 2):

  • Combined with sparse depth inputs ("Ours-Metric")
  • Superior performance on stricter thresholds (ฮดโ‚€.โ‚€โ‚, ฮดโ‚€.โ‚€โ‚‚, ฮดโ‚€.โ‚€โ‚„)

Real-World Benchmarks

Relative Depth (Table 3):

  • Competitive performance on KITTI, ETH3D, NYUv2, ScanNet, DIODE
  • On par with current SOTA methods

Metric Depth (Table 4):

  • Clear improvements over existing metric depth methods
  • Outperforms Marigold-DC, Omni-DC, PriorDA, PromptDA

Qualitative Comparisons

Depth Map Quality

Figure 5: Qualitative depth comparisons

Figure 5 shows:

  • First two rows: Synth4K predictions with superior detail preservation
  • Bottom row: Real-world data with low-resolution input
  • Highlighted boxes demonstrate fine-detail recovery capabilities

Metric Depth Results

Figure 6: Metric depth comparisons

Figure 6 highlights geometric detail recovery in high-frequency regions, showing cleaner edges and better preservation of fine structures.

Novel View Synthesis

Figure 8: NVS results

Figure 8 demonstrates superior novel view synthesis under large viewpoint shifts:

  • InfiniDepth produces complete, stable results
  • ADGaussian baseline shows noticeable geometric holes and artifacts
  • The infinite depth query strategy ensures uniform point distribution

Key Ablation Studies

Depth Representation Effectiveness (Table 5)

  • Neural implicit fields significantly outperform discrete grid representations
  • Gains more pronounced in metric depth estimation with sparse inputs

Multi-Scale Feature Query

  • Multi-scale mechanism brings substantial improvements
  • Single-scale baseline performs considerably worse

Computational Efficiency (Table 6)

  • Decoder has the lowest parameter count among compared methods
  • Competitive computational efficiency despite superior detail preservation

Impact and Significance

  1. Resolution Independence: Breaks free from training resolution constraints
  2. Fine Detail Preservation: Excels in geometrically complex regions
  3. Multi-Task Versatility: Effective for both relative and metric depth estimation
  4. Downstream Applications: Benefits novel view synthesis, 3D reconstruction pipelines
  5. Benchmark Contribution: Synth4K enables better evaluation of high-resolution depth estimation

Limitations and Future Work

  • No explicit temporal consistency for video applications
  • Future work: extend to multi-view settings for improved temporal stability and 3D consistency

InfiniDepth represents a fundamental shift in depth estimation, moving from discrete grid representations to continuous neural implicit fields, enabling unprecedented resolution scalability and detail preservation capabilities.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.03252 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.03252 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.03252 in a Space README.md to link it from this page.

Collections including this paper 7