MPII Human Pose¶

MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label.

Use cases¶

Human body joint detection.

Properties¶

name: mpii_pose
keywords: image_processing, detection, human_pose, keypoints
dataset size: 12,1 GB
is downloadable: yes
tasks:
- keypoints (default)
- keypoints_clean (TODO)

Tasks¶

keypoints (default)¶

How to use
Properties
HDF5 file structure
Fields

How to use¶

>>> # import the package
>>> import dbcollection as dbc
>>>
>>> # load the dataset
>>> mnist = dbc.load('mpii_pose', 'keypoints')
>>> mnist
DataLoader: "mpii_pose" (keypoints task)

Properties¶

primary use: body joint prediction / classification
description: Contains single human body pose annotations for body joint prediction / classification.
sets: train, train01, val01, test
metadata file size in disk: 7 MB
has annotations: yes
- which:
  
  labels for each activity.
  
  frame position of an image from the original video
  
  head bounding box coordinates
  
  body joint coordinates (x, y) and visibility
  
  center coordinates (x, y) of a single person detection
  
  scale of the person detection w.r.t. 200px height detections
  
  is detection of sufficiently separated individuals?
  
  acivities
  
  category names
  
  keypoint labels
  
  video names / ids
available fields:

HDF5 file structure¶

/
├── train/
│   ├── activity_id       # dtype=np.int32, shape=(29116,),
│   ├── activity_name     # dtype=np.uint8, shape=(29116,101)  (note: string in ASCII format)
│   ├── category_name     # dtype=np.uint8, shape=(29116,23)   (note: string in ASCII format)
│   ├── frame_sec         # dtype=np.int32, shape=(29116,)
│   ├── head_bbox         # dtype=np.float, shape=(29116,4)
│   ├── image_filenames   # dtype=np.uint8, shape=(29116,21)   (note: string in ASCII format)
│   ├── keypoint_labels   # dtype=np.uint8, shape=(16,15)      (note: string in ASCII format)
│   ├── keypoints         # dtype=np.float, shape=(29116,16,3)
│   ├── object_fields     # dtype=np.uint8, shape=(13,16)       (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(29116,13)
│   ├── objpos            # dtype=np.float, shape=(29116,2)
│   ├── scales            # dtype=np.float, shape=(29116,)
│   ├── video_ids         # dtype=np.int32, shape=(29116,)
│   ├── video_names       # dtype=np.uint8, shape=(29116,12)    (note: string in ASCII format)
│   ├── list_keypoints_per_image       # dtype=np.int32, shape=(18079,17)
│   └── list_single_person_per_image   # dtype=np.int32, shape=(18079,1))
│
├── train01/
│   ├── activity_id       # dtype=np.int32, shape=(20310,),
│   ├── activity_name     # dtype=np.uint8, shape=(20310,101)  (note: string in ASCII format)
│   ├── category_name     # dtype=np.uint8, shape=(20310,23)   (note: string in ASCII format)
│   ├── frame_sec         # dtype=np.int32, shape=(20310,)
│   ├── head_bbox         # dtype=np.float, shape=(20310,4)
│   ├── image_filenames   # dtype=np.uint8, shape=(20310,21)   (note: string in ASCII format)
│   ├── keypoint_labels   # dtype=np.uint8, shape=(16,15)      (note: string in ASCII format)
│   ├── keypoints         # dtype=np.float, shape=(20310,16,3)
│   ├── object_fields     # dtype=np.uint8, shape=(13,16)       (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(20310,13)
│   ├── objpos            # dtype=np.float, shape=(20310,2)
│   ├── scales            # dtype=np.float, shape=(20310,)
│   ├── video_ids         # dtype=np.int32, shape=(20310,)
│   ├── video_names       # dtype=np.uint8, shape=(20310,12)    (note: string in ASCII format)
│   ├── list_keypoints_per_image       # dtype=np.int32, shape=(12656,17)
│   └── list_single_person_per_image   # dtype=np.int32, shape=(12656,1))
│
├── val01/
│   ├── activity_id       # dtype=np.int32, shape=(8806,),
│   ├── activity_name     # dtype=np.uint8, shape=(8806,101)  (note: string in ASCII format)
│   ├── category_name     # dtype=np.uint8, shape=(8806,23)   (note: string in ASCII format)
│   ├── frame_sec         # dtype=np.int32, shape=(8806,)
│   ├── head_bbox         # dtype=np.float, shape=(8806,4)
│   ├── image_filenames   # dtype=np.uint8, shape=(8806,21)   (note: string in ASCII format)
│   ├── keypoint_labels   # dtype=np.uint8, shape=(16,15)      (note: string in ASCII format)
│   ├── keypoints         # dtype=np.float, shape=(8806,16,3)
│   ├── object_fields     # dtype=np.uint8, shape=(13,16)       (note: string in ASCII format)
│   ├── object_ids        # dtype=np.int32, shape=(8806,13)
│   ├── objpos            # dtype=np.float, shape=(8806,2)
│   ├── scales            # dtype=np.float, shape=(8806,)
│   ├── video_ids         # dtype=np.int32, shape=(8806,)
│   ├── video_names       # dtype=np.uint8, shape=(8806,12)    (note: string in ASCII format)
│   ├── list_keypoints_per_image       # dtype=np.int32, shape=(5423,17)
│   └── list_single_person_per_image   # dtype=np.int32, shape=(5423,7))
│
└── test/
    ├── activity_id       # dtype=np.int32, shape=(11776,),
    ├── activity_name     # dtype=np.uint8, shape=(11776,101)  (note: string in ASCII format)
    ├── category_name     # dtype=np.uint8, shape=(11776,23)   (note: string in ASCII format)
    ├── frame_sec         # dtype=np.int32, shape=(11776,)
    ├── image_filenames   # dtype=np.uint8, shape=(11776,21)   (note: string in ASCII format)
    ├── keypoint_labels   # dtype=np.uint8, shape=(16,15)      (note: string in ASCII format)
    ├── object_fields     # dtype=np.uint8, shape=(13,16)       (note: string in ASCII format)
    ├── object_ids        # dtype=np.int32, shape=(11776,13)
    ├── objpos            # dtype=np.float, shape=(11776,2)
    ├── scales            # dtype=np.float, shape=(11776,)
    ├── video_ids         # dtype=np.int32, shape=(11776,)
    ├── video_names       # dtype=np.uint8, shape=(11776,12)    (note: string in ASCII format)
    └── list_single_person_per_image   # dtype=np.int32, shape=(6908,7))

Fields¶

activity_id: activity ids
- available in: train, train01, val01, test
- dtype: np.int32
- is padded: False
- fill value: -1

activity_name: activity names
- available in: train, train01, val01, test
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format

category_name: category names
- available in: train, train01, val01, test
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format

frame_sec: image position in video, in seconds
- available in: train, train01, val01, test
- dtype: np.int32
- is padded: False
- fill value: -1

head_bbox: head bounding box coordinates
- available in: train, train01, val01
- dtype: np.float
- is padded: False
- fill value: -1
- note: bbox format [x1,y1,x2,y2]

image_filenames: image file name + path
- available in: train, train01, val01, test
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format

keypoint_labels: body joint names
- available in: train, train01, val01, test
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format

keypoints: body joint coordinates (x, y)
- available in: train, train01, val01
- dtype: np.float
- is padded: False
- fill value: -1
- note: keypoint format [x1, y1, is_visible]

object_fields: list of field names of the object id list
- available in: train, train01, val01, test
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format
- note: key field (field name aggregator)

object_ids: list of field ids
- available in: train, train01, val01, test
- dtype: np.int32
- is padded: False
- fill value: -1
- note: key field (field id aggregator)

objpos: person / detection center coordinates
- available in: train, train01, val01, test
- dtype: np.float
- is padded: False
- fill value: -1
- note: position format [x, y]

scale: person scale w.r.t. 200px height
- available in: train, train01, val01, test
- dtype: np.float
- is padded: False
- fill value: -1

video_id: video index
- available in: train, train01, val01, test
- dtype: np.int32
- is padded: True
- fill value: -1

video_name: video name
- available in: train, train01, val01, test
- dtype: np.uint8
- is padded: True
- fill value: 0
- note: strings stored in ASCII format

list_keypoints_per_image: list of available body joints ids per image
- available in: train, train01, val01
- dtype: np.int32
- is padded: True
- fill value: -1
- note: pre-ordered list

list_single_person_per_image: list of single person detection ids per image
- available in: train, train01, val01, test
- dtype: np.int32
- is padded: True
- fill value: -1
- note: pre-ordered list