UCF101 - Action Recognition¶

UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories.

With 13320 videos from 101 action categories, UCF101 gives the largest diversity in terms of actions and with the presence of large variations in camera motion, object appearance and pose, object scale, viewpoint, cluttered background, illumination conditions, etc, it is the most challenging data set to date.

The videos in 101 action categories are grouped into 25 groups, where each group can consist of 4-7 videos of an action.

Use cases¶

Human action recognition in videos.

Properties¶

name: ucf_101
keywords: image_processing, recognition, activity, human, single_person
dataset size: 6,9 GB
is downloadable: yes
tasks:
- recognition: (default)
  
  primary use: action recognition in videos
  
  description: Contains videos and action label annotations for action recognition
  
  sets: train01, train02, train03, test01, test02, test03
  
  metadata file size in disk: 14,5 MB
  
  has annotations: yes
  
  which:
  
  activity labels for each video.

Metadata structure (HDF5)¶

Task: recognition¶

/
├── train01/
│   ├── activities        # dtype=np.uint8, shape=(101,19)        (note: string in ASCII format)
│   ├── image_filenames   # dtype=np.uint8, shape=(1788425,113)   (note: string in ASCII format)
│   ├── total_frames      # dtype=np.int32, shape=(9537,)
│   ├── video_filenames   # dtype=np.uint8, shape=(9537,60)
│   ├── videos            # dtype=np.uint8, shape=(9537,29)       (note: string in ASCII format)
│   ├── object_fields     # dtype=np.uint8, shape=(5,31)          (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(9537,5)
│   ├── list_image_filenames_per_video   # dtype=np.int32, shape=(9537,1776)
│   └── list_videos_per_activity         # dtype=np.int32, shape=(101,121)
│
├── test01/
│   ├── activities        # dtype=np.uint8, shape=(101,19)       (note: string in ASCII format)
│   ├── image_filenames   # dtype=np.uint8, shape=(697865,113)   (note: string in ASCII format)
│   ├── total_frames      # dtype=np.int32, shape=(3783,)
│   ├── video_filenames   # dtype=np.uint8, shape=(3783,60)
│   ├── videos            # dtype=np.uint8, shape=(3783,29)      (note: string in ASCII format)
│   ├── object_fields     # dtype=np.uint8, shape=(5,31)         (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(3783,5)
│   ├── list_image_filenames_per_video   # dtype=np.int32, shape=(3783,900)
│   └── list_videos_per_activity         # dtype=np.int32, shape=(101,49)
│
├── train02/
│   ├── activities        # dtype=np.uint8, shape=(101,19)        (note: string in ASCII format)
│   ├── image_filenames   # dtype=np.uint8, shape=(1791290,113)   (note: string in ASCII format)
│   ├── total_frames      # dtype=np.int32, shape=(9586,)
│   ├── video_filenames   # dtype=np.uint8, shape=(9586,60)
│   ├── videos            # dtype=np.uint8, shape=(9586,29)       (note: string in ASCII format)
│   ├── object_fields     # dtype=np.uint8, shape=(5,31)          (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(9586,5)
│   ├── list_image_filenames_per_video   # dtype=np.int32, shape=(9586,1776)
│   └── list_videos_per_activity         # dtype=np.int32, shape=(101,122)
│
├── test02/
│   ├── activities        # dtype=np.uint8, shape=(101,19)       (note: string in ASCII format)
│   ├── image_filenames   # dtype=np.uint8, shape=(695000,113)   (note: string in ASCII format)
│   ├── total_frames      # dtype=np.int32, shape=(3734,)
│   ├── video_filenames   # dtype=np.uint8, shape=(3734,60)
│   ├── videos            # dtype=np.uint8, shape=(3734,29)      (note: string in ASCII format)
│   ├── object_fields     # dtype=np.uint8, shape=(5,31)         (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(3734,5)
│   ├── list_image_filenames_per_video   # dtype=np.int32, shape=(3734,833)
│   └── list_videos_per_activity         # dtype=np.int32, shape=(101,49)
│
├── train03/
│   ├── activities        # dtype=np.uint8, shape=(101,19)        (note: string in ASCII format)
│   ├── image_filenames   # dtype=np.uint8, shape=(1786111,113)   (note: string in ASCII format)
│   ├── total_frames      # dtype=np.int32, shape=(9624,)
│   ├── video_filenames   # dtype=np.uint8, shape=(9624,60)
│   ├── videos            # dtype=np.uint8, shape=(9624,29)       (note: string in ASCII format)
│   ├── object_fields     # dtype=np.uint8, shape=(5,31)          (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(9624,5)
│   ├── list_image_filenames_per_video   # dtype=np.int32, shape=(9624,900)
│   └── list_videos_per_activity         # dtype=np.int32, shape=(101,124)
│
└── test03/
    ├── activities        # dtype=np.uint8, shape=(101,19)       (note: string in ASCII format)
    ├── image_filenames   # dtype=np.uint8, shape=(700157,113)   (note: string in ASCII format)
    ├── total_frames      # dtype=np.int32, shape=(3696,)
    ├── video_filenames   # dtype=np.uint8, shape=(3696,60)
    ├── videos            # dtype=np.uint8, shape=(3696,29)      (note: string in ASCII format)
    ├── object_fields     # dtype=np.uint8, shape=(5,31)         (note: string in ASCII format)
    ├── object_ids        # dtype=np.int32, shape=(3696,5)
    ├── list_image_filenames_per_video   # dtype=np.int32, shape=(3696,1776)
    └── list_videos_per_activity         # dtype=np.int32, shape=(101,48)

Fields¶

activities: activity names
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format
image_filenames: image file path+name
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format
total_frames: number of frames per video
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.int32
- is padded: False
- fill value: -1
videos: video name
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format
video_filenames: video file path+name
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format
object_fields: list of field names of the object id list
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format
- note: key field (field name aggregator)
object_ids: list of field ids
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.int32
- is padded: False
- fill value: -1
- note: key field (field id aggregator)
list_image_filenames_per_video: list of image ids per video
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.int32
- is padded: True
- fill value: -1
- note: pre-ordered list
list_videos_per_activity: list of video ids per activity
- available in: train01, train02, train03, test01, test02, test03
- dtype: np.int32
- is padded: True
- fill value: -1
- note: pre-ordered list

Disclaimer¶

For information about the dataset and its terms of use, please see this link.