MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

Abdelfattah, Mohamed Ossama Ahmed; Hassan, Mariam; Alahi, Alexandre

Abdelfattah, Mohamed Ossama Ahmed; Hassan, Mariam; Alahi, Alexandre

2024

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Current transformer-based skeletal action recognition models tend to focus on a limited set of joints and low-level motion patterns to predict action classes. This results in significant performance degradation under small skeleton perturbations or changing the pose estimator between training and testing. In this work, we introduce MaskCLR, a new Masked Contrastive Learning approach for Robust skeletal action recognition. We propose an Attention-Guided Probabilistic Masking strategy to occlude the most important joints and encourage the model to explore a larger set of discriminative joints. Furthermore, we propose a Multi-Level Contrastive Learning paradigm to enforce the representations of standard and occluded skeletons to be class-discriminative, i.e., more compact within each class and more dispersed across different classes. Our approach helps the model capture the high-level action semantics instead of low-level joint variations, and can be conveniently incorporated into transformer based models. Without loss of generality, we combine MaskCLR with three transformer backbones: the vanilla transformer, DSTFormer, and STTFormer. Extensive experiments on NTU60, NTU120, and Kinetics400 show that MaskCLR consistently outperforms previous state-of-the-art methods on standard and perturbed skeletons from different pose estimators, showing improved accuracy, generalization, and robustness.

Details

Title MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

Author(s) Abdelfattah, Mohamed Ossama Ahmed ; Hassan, Mariam ; Alahi, Alexandre

Published in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2024

Pagination 8

Conference IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA, June 17-21, 2024

Date 2024-06-17

Laboratories VITA
VITA

Record Appears in Scientific production and competences > ENAC - School of Architecture, Civil and Environmental Engineering > IIC - Civil Engineering Institute > VITA - Visual Intelligence for Transportation
Peer-reviewed publications
Conference Papers
Work produced at EPFL

Record creation date 2024-04-12