Video Content Analysis

KA-MIN: Knowledge-Aware Multimodal Interaction Network for Emotion Recognition in Conversation

Emotion recognition in conversations (ERC) has garnered significant attention for its critical role in human-computer interaction systems.

Aggregate and Discriminate: Pseudo Clips-Guided Boundary Perception for Video Moment Retrieval

Video moment retrieval (VMR) aims to localize a video segment in an untrimmed video that is semantically relevant to a language query.

Efficient Spatio-Temporal Video Grounding with Semantic-Guided Feature Decomposition

patio-temporal video grounding (STVG) aims to localize the spatiotemporal object tube in a video according to a given text query.....