3D conceptual foundation on neural field
To answer imaging questions, current methods rely on 2D segmental masks obtained from supervised model. However, the mismatch results in incorrect segmentation and answers. Thus, a recent arXiv.org paper proposes a new conceptual framework using an intermediate 3D representation. It closely resembles the way humans reason by creating a basis for concepts from images.
The researchers propose to leverage the continuous, distinguishable nature of neural fields as an intermediate 3D representation, which can be used for segmentation and concept learning through question-answering. On top of the neuron field, a set of neural operators is defined. With them, intuitive inference can also be done well.
The proposed method outperforms the baseline models in both segmentation and reasoning tasks. It also generalizes well to categories of invisible shapes and real scans.
In this paper, we tackle the challenging 3D conceptual foundation (i.e. segmentation and learning of visual concepts) by viewing RGBD images and inferring questions and answers. paired. Current visual inference methods typically use supervised methods to extract 2D segmental masks based on such concepts. In contrast, humans have the ability to form the basis of concepts based on the underlying 3D representation of the image. However, traditionally inferred 3D representations (e.g. point clouds, voxelgrids, and meshes) cannot capture continuous 3D features dynamically, thus making concepts difficult 3D area basics based on the linguistic description of the referenced object. To solve both problems, we propose to take advantage of the continuous, distinguishable nature of neural fields to segment and learn concepts. Specifically, each 3D coordinate in a scene is represented as a height descriptor. Concept grounding can then be performed by computing the similarity between the description vector of a 3D coordinate and the vector embedding of a language concept, allowing segmentation and shared concept learning. across neural fields in a different way. Thus, both the semantic segmentation and the 3D version can emerge directly from the question-answer supervisor using a set of neural operators defined on top of the neural fields (e.g. e.g. filtering and counting). Test results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and versioning segmentation tasks, and outperforms existing models of challenging 3D-aware visual inference tasks. Furthermore, our framework can generalize well to categories of invisible shapes and real scans.
Research articles: Hong, Y., Du, Y., Lin, C., Tenenbaum, JB and Gan, C., “3D Conceptual Foundations on Neural Fields”, 2022. Link: https://arxiv.org/abs/2207.06403
Project page: https://3d-cg.csail.mit.edu/