Panoptic Scene Graph Generation – Technology Organization

news7g07/27/2022

9 2 minutes read

Scene graph generation (a. SGG task) vs. panoptic scene graph gen- eration (b. PSG task). Image credit: arXiv:2207.11247 [cs.CV]

Scene graph generation creates a graph structured representation from a given image to abstract away objects — based on bounding boxes — and their pairwise relationships. It has diverse applications, such as intuitive reasoning, image captionsand robot.

Creating a scene graph (a. sgk task) versus creating a scene graph (b. PSG task). Image credit: arXiv: 2207.11247 [cs.CV]

Scene graph generation (a. sgk task) vs. panorama scene graph creation (b. PSG task). Image credit: arXiv: 2207.11247 [cs.CV]

However, a recent paper on arXiv.org argues that such a bounding box-based model is not ideal for solving the problem. They only provide rough localization of objects and cannot cover the whole picture of an image.

Researchers propose to create panoramic scene graph (PSG) to generate scene graph representation based on panoramic segments instead of rigid bounding boxes. A large PSG dataset with high-quality annotations is generated and two-stage and single-stage PSG baselines are proposed.

Evaluation on the new dataset shows that the single-stage models, despite having a simplified training model, achieve competitive results on the dataset.

The present study deals with scene graphing (SGG) – an important technology for understanding scene in images – from a detection perspective, i.e., objects detected using bounding boxes. term, followed by predictions of their pairwise relationships. We argue that such a model causes a number of problems that hinder the progress of the field. For example, bounding box-based labels in the current dataset often contain redundant layers like hairs and leave background information important to understanding of the context. In this work, we introduce how to create a panoramic scene histogram (PSG), a new problem-solving task that requires the model to generate a more comprehensive scene histogram based on panorama segments instead of scenes. rigid bounding box. The high-quality PSG dataset, containing 49k well-annotated overlapping images from COCO and Visual Genome, was created for the community to track its progress. For the benchmark, we construct four two-stage baselines, modified from the classical methods in SGG, and two single-stage baselines called PSGTR and PSGFormer, based on an efficient detector based on on the Transformer, ie, DETR. While PSGTR uses a set of queries to directly learn triples, PSGFormer separately models objects and relations as queries from the two Transformer decoders, followed by an object matching mechanism. statue-like reminder relation. Finally, we share insights into future-oriented and open challenges.

Research articles: Yang, J., Zhe Ang, Y., Guo, Z., Zhou, K., Zhang, W. and Liu, Z., “Panoptic Scene Graph Generation”, 2022. Link: https://arxiv.org/abs/2207.11247
Project page: https://psgdataset.org/

Source link

news7g07/27/2022

9 2 minutes read