Video Content Analysis

Efficient Spatio-Temporal Video Grounding with Semantic-Guided Feature Decomposition

patio-temporal video grounding (STVG) aims to localize the spatiotemporal object tube in a video according to a given text query.....

Sequence as a Whole: A Unified Framework for Video Action Localization With Long-Range Text Query

Comprehensive understanding of video content requires both spatial and temporal localization. However, .....