Entropy-Gated Prediction Agreement for Two-View Video Action Recognition

Archive/Entropy-Gated Prediction Agreement for Two-View Video Action Recognition

Young-Jin Park, Hui-Sup Cho

30 juin 2026

Abstract

Human action recognition (HAR) often struggles to capture important temporal cues distributed across an entire video when relying solely on a single sampled clip. To overcome this limitation, this study proposes a framework that constructs two temporal views from the same video and explicitly learns the prediction consistency between them. Specifically, the prediction-level agreement (AG) loss was introduced to align the class probability distributions of the two views. In addition, conditional gating was applied to adaptively control the contribution of AG loss according to the sample-wise prediction confidence, thereby reducing unstable alignment in temporally ambiguous or information-insufficient segments. The proposed framework was evaluated using both convolutional neural network (CNN)- and Transformer-based backbones on three representative action-recognition benchmark datasets, and it generally improved the performance over the single-view baseline across backbone–dataset combinations. Further empirical analyses, including training behavior, motion magnitude, temporal prediction stability, and qualitative case studies, were conducted to examine the effectiveness and behavior of the proposed two-view framework from multiple perspectives.

Metadata

DOI: 10.3390/electronics15132844 CC BY 4.0 license

IPC Classification

G06H04

Keywords

entropy-gatedpredictionagreementtwo-viewvideoactionrecognitionelectronicshumanoftenstrugglescaptureimportanttemporalcuesdistributedacrossentirewhenrelyingsolelysinglesampledclip

Citer cette publication

€ 4.00

← Back to Archive