XMem: Long-term video object segmentation with the Atkinson-Shiffrin memory model

news7g07/16/2022

10 1 minute read

Video object segmentation aims to highlight specific target audiences in a given video. A recent article on arXiv.org focuses on a semi-supervised setting where the user provides annotations for the first frame and the method of segmenting objects in all other frames as accurately as possible .

Image credit: arXiv: 2207.07115 [cs.CV]

The researchers propose a unified memory architecture. Inspired by the Atkinson-Shiffrin memory model, it maintains three independent but deeply connected feature memory stores: rapidly updated sensory memory, high-resolution working memory and compact long-term memory.

The memory consolidation algorithm selects representative archetypes from working memory, while the memory allocation algorithm enriches these archetypes into a compact representation for long-term memory storage. The combination of the three storage memories enables long videos to be handled with great precision while keeping GPU memory usage low.

We introduce XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Previous work on video object segmentation typically used only one type of feature memory. For videos longer than one minute, a single feature memory model closely links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates many independent but deeply connected feature memory stores: rapidly updated sensory memory, working memory has high resolution and compact memory thus long-lasting. Importantly, we develop a memory allocation algorithm that regularly merges actively used active memory elements into long-term memory, helping to avoid memory bursts and minimize performance degradation. rate for long-term prediction. Combined with the new memory read mechanism, XMem far exceeds modern performance on long video datasets while being on par with modern methods (which do not work on long video) on short video datasets. Code available at This https URL

Research articles: Kei Cheng, H. and Schwing, AG, “XMem: Long-Term Video Object Segmentation with the Atkinson-Shiffrin Memory Model”, 2022. Link: https://arxiv.org/abs/2207.07115

Source link

news7g07/16/2022

10 1 minute read