Sports and human action recognition in videos.
Video MAE (Masked Auto Encoder) is a network for doing video classification that uses the ViT (Vision Transformer) backbone.
Model checkpoint:Kinectics-400
Input resolution:224x224
Number of parameters:87.7M
Model size (float):335 MB
Camera
Action Recognition
Source Model: CC-BY-4.0
Deployable Model: AI-HUB-MODELS-LICENSE