Video-MAE

简介

Sports and human action recognition in videos.
Video MAE (Masked Auto Encoder) is a network for doing video classification that uses the ViT (Vision Transformer) backbone.

效果视频

规格与下载

技术细节

Model checkpoint:Kinectics-400
Input resolution:224x224
Number of parameters:87.7M
Model size (float):335 MB

应用领域

Camera
Action Recognition

授权信息

Source Model: CC-BY-4.0
Deployable Model: AI-HUB-MODELS-LICENSE