Generative Densification: Learning to Densify Gaussians

for High-Fidelity Generalizable 3D Reconstruction

Sungkyunkwan University
*Equal Contribution
MY ALT TEXT

Our method selectively densifies (a) coarse Gaussians generated by generalized feed-forward models. (c) The top K Gaussians with large view-space positional gradients are selected, and (d-e) their fine Gaussians are generated in each densification layer. (g) The final Gaussians are obtained by combining (b) the remaining (non-selected) Gaussians with (f) the union of each layer's output Gaussians.

Abstract

Generalized feed-forward Gaussian models have achieved significant progress in sparse-view 3D reconstruction by leveraging prior knowledge from large multi-view datasets. However, these models often struggle to represent high-frequency details due to the limited number of Gaussians. While the densification strategy used in per-scene 3D Gaussian splatting (3D-GS) optimization can be adapted to the feed-forward models, it may not be ideally suited for generalized scenarios. In this paper, we propose Generative Densification, an efficient and generalizable method to densify Gaussians generated by feed-forward models. Unlike the 3D-GS densification strategy, which iteratively splits and clones raw Gaussian parameters, our method up-samples feature representations from the feed-forward models and generates their corresponding fine Gaussians in a single forward pass, leveraging the embedded prior knowledge for enhanced generalization. Experimental results on both object-level and scene-level reconstruction tasks demonstrate that our method outperforms state-of-the-art approaches with comparable or smaller model sizes, achieving notable improvements in representing fine details.

Method

Generative Densification Overview

Our method selectively densifies coarse Gaussians generated by feed-forward Gaussian models:
  1. The top K coarse Gaussians with large view-space positional gradients are selected.
  2. The selected Gaussians are processed through the densification module to generate fine Gaussians.
  3. The final Gaussians are obtained by combining the fine Gaussians with the remaining coarse Gaussians.

Object-level & Scene-level Pipelines

We present two models incorporating our densification method, based on LaRa and MVSplat. For object-level reconstruction, fine Gaussians are generated using the Gaussians and volume features produced by the LaRa backbone (top row). For scene-level reconstruction, fine Gaussians are generated per view by utilizing the pixel-aligned Gaussians and image features extracted from the MVSplat backbone (bottom row).


Object-level Reconstruction

Quantitative Comparisons

Method #Param(M) Gobjaverse GSO Co3D
PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
MVSNeRF 0.52 14.48 0.896 0.185 15.21 0.912 0.154 12.94 0.841 0.241
MuRF 15.7 14.05 0.877 0.301 12.89 0.885 0.279 11.60 0.815 0.393
LGM 415 19.67 0.867 0.157 23.67 0.917 0.063 13.81 0.739 0.414
GS-LRM 300 - - - 30.52 0.952 0.050 - - -
LaRa 125 27.49 0.938 0.093 29.70 0.959 0.060 21.18 0.862 0.216
Ours 134 28.58 0.945 0.080 31.06 0.966 0.058 21.72 0.865 0.209
Ours (w/ residual) 134 28.75 0.946 0.078 31.23 0.967 0.058 22.08 0.867 0.206
Quantitative comparison results. Our model achieves the highest PSNR in both in-domain reconstruction and cross-dataset generalization tasks, even outperforming GS-LRM (the current state-of-the-art model for object-level reconstruction).

Qualitative Comparisons

GT LaRa Ours Coarse Gauss. Fine Gauss.

Video Results (Gobjaverse)



Video Results (Google Scanned Object)


Scene-level Reconstruction

Quantitative Comparisons

Method #Param(M) RealEstate10K
PSNR↑ SSIM↑ LPIPS↓
pixelNeRF 250 20.43 0.589 0.550
GPNR 27 24.11 0.793 0.255
MuRF 15.7 26.10 0.858 0.143
pixelSplat 125.1 25.89 0.858 0.142
MVSplat 12 26.39 0.869 0.128
MVSplat-finetune 12 26.46 0.870 0.127
DepthSplat (small) 37 26.76 0.877 0.123
Ours 27.8 27.08 0.879 0.120
Quantitative comparisons on the RealEstate10K dataset. Our model outperforms the baselines including DepthSplat (small).
Method ACID
PSNR↑ SSIM↑ LPIPS↓
pixelSplat 27.64 0.830 0.160
MVSplat 28.15 0.841 0.147
Ours 28.61 0.847 0.141
Method DTU
PSNR↑ SSIM↑ LPIPS↓
pixelSplat 12.89 0.382 0.560
MVSplat 13.94 0.473 0.385
Ours 14.05 0.477 0.380
Cross-dataset generalization results on the ACID and DTU datasets. Our model consistently achieves the best performance.

Qualitative Comparisons

GT MVSplat Ours

Additional Comparisons


BibTeX

@article{GenerativeDensification,
    title={Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction}, 
    author={Nam, Seungtae and Sun, Xiangyu and Kang, Gyeongjin and Lee, Younggeun and Oh, Seungjun and Park, Eunbyung},
    journal={arXiv preprint arXiv:2412.06234},
    year={2024}
}