GAUDI: A Neural Architect to Create Vivid 3D Scenes
There is a need for progress in reproductive pattern for learning systems to understand and create 3D space. A recent article on arXiv.org recommends Gaudi, a named generalization model related to the famous architect. It can capture the distribution of 3D scenes and render views from scenes sampled from the learned distribution.
The model uses a scalable two-stage approach. First, a latent representation that disturbs the radiation fields and the camera pose is learned. Then the distribution of non-interlaced latent representations is modeled with a prior robustness.
The researchers introduce a new volume reduction optimization objective to find latent representations and model the radiation field and camera poses in a dissimilar way. This method achieves modern generation performance on many data sets and can be used for both conditional and unconditioned problems.
We introduce GAUDI, a composite model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that perturbs radiation fields and camera pose. This latent representation is then used to learn a general model that allows for both conditional and unconditional 3D scenes to be created. Our model generalizes to previous works focusing on single subjects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI achieves state-of-the-art performance in unconditional general settings across multiple datasets and allows the creation of conditional 3D scenes with harmonic variables such as sparse image observations or text. scene description.
Research articles: Bautista, MA, “GAUDI: A Neural Architect for Vivid 3D Scene Creation”, 2022. Link: https://arxiv.org/abs/2207.13751
Project location: https://github.com/apple/ml-gaudi