Archive/Entropy-Gated Prediction Agreement for Two-View Video Action Recognition
Entropy-Gated Prediction Agreement for Two-View Video Action Recognition
Young-Jin Park, Hui-Sup Cho
30 juin 2026
en

Abstract

Human action recognition (HAR) often struggles to capture important temporal cues distributed across an entire video when relying solely on a single sampled clip. To overcome this limitation, this study proposes a framework that constructs two temporal views from the same video and explicitly learns the prediction consistency between them. Specifically, the prediction-level agreement (AG) loss was introduced to align the class probability distributions of the two views. In addition, conditional gating was applied to adaptively control the contribution of AG loss according to the sample-wise prediction confidence, thereby reducing unstable alignment in temporally ambiguous or information-insufficient segments. The proposed framework was evaluated using both convolutional neural network (CNN)- and Transformer-based backbones on three representative action-recognition benchmark datasets, and it generally improved the performance over the single-view baseline across backbone–dataset combinations. Further empirical analyses, including training behavior, motion magnitude, temporal prediction stability, and qualitative case studies, were conducted to examine the effectiveness and behavior of the proposed two-view framework from multiple perspectives.

IPC Classification

G06H04

Keywords

entropy-gatedpredictionagreementtwo-viewvideoactionrecognitionelectronicshumanoftenstrugglescaptureimportanttemporalcuesdistributedacrossentirewhenrelyingsolelysinglesampledclip
Citer cette publication

€ 4.00