MPII Human Pose¶
MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label.
Use cases¶
Human body joint detection.
Properties¶
name
: mpii_posekeywords
: image_processing, detection, human_pose, keypointsdataset size
: 12,1 GBis downloadable
: yestasks
:- keypoints (default)
- keypoints_clean (TODO)
Tasks¶
keypoints (default)¶
- How to use
- Properties
- HDF5 file structure
- Fields
How to use¶
>>> # import the package
>>> import dbcollection as dbc
>>>
>>> # load the dataset
>>> mpii_pose = dbc.load('mpii_pose')
>>> mpii_pose
DataLoader: "mpii_pose" (keypoints task)
Properties¶
primary use
: body joint prediction / classificationdescription
: Contains single human body pose annotations for body joint prediction / classification.sets
: train, train01, val01, testmetadata file size in disk
: 7 MBhas annotations
: yeswhich
:- labels for each activity.
- frame position of an image from the original video
- head bounding box coordinates
- body joint coordinates (x, y) and visibility
- center coordinates (x, y) of a single person detection
- scale of the person detection w.r.t. 200px height detections
- is detection of sufficiently separated individuals?
- acivities
- category names
- keypoint labels
- video names / ids
HDF5 file structure¶
/
├── train/
│ ├── activity_id # dtype=np.int32, shape=(29116,),
│ ├── activity_name # dtype=np.uint8, shape=(29116,101) (note: string in ASCII format)
│ ├── category_name # dtype=np.uint8, shape=(29116,23) (note: string in ASCII format)
│ ├── frame_sec # dtype=np.int32, shape=(29116,)
│ ├── head_bbox # dtype=np.float, shape=(29116,4)
│ ├── image_filenames # dtype=np.uint8, shape=(29116,21) (note: string in ASCII format)
│ ├── keypoint_labels # dtype=np.uint8, shape=(16,15) (note: string in ASCII format)
│ ├── keypoints # dtype=np.float, shape=(29116,16,3)
│ ├── object_fields # dtype=np.uint8, shape=(13,16) (note: string in ASCII format)
│ ├── object_ids # dtype=np.int32, shape=(29116,13)
│ ├── objpos # dtype=np.float, shape=(29116,2)
│ ├── scales # dtype=np.float, shape=(29116,)
│ ├── video_ids # dtype=np.int32, shape=(29116,)
│ ├── video_names # dtype=np.uint8, shape=(29116,12) (note: string in ASCII format)
│ ├── list_keypoints_per_image # dtype=np.int32, shape=(18079,17)
│ └── list_single_person_per_image # dtype=np.int32, shape=(18079,1))
│
├── train01/
│ ├── activity_id # dtype=np.int32, shape=(20310,),
│ ├── activity_name # dtype=np.uint8, shape=(20310,101) (note: string in ASCII format)
│ ├── category_name # dtype=np.uint8, shape=(20310,23) (note: string in ASCII format)
│ ├── frame_sec # dtype=np.int32, shape=(20310,)
│ ├── head_bbox # dtype=np.float, shape=(20310,4)
│ ├── image_filenames # dtype=np.uint8, shape=(20310,21) (note: string in ASCII format)
│ ├── keypoint_labels # dtype=np.uint8, shape=(16,15) (note: string in ASCII format)
│ ├── keypoints # dtype=np.float, shape=(20310,16,3)
│ ├── object_fields # dtype=np.uint8, shape=(13,16) (note: string in ASCII format)
│ ├── object_ids # dtype=np.int32, shape=(20310,13)
│ ├── objpos # dtype=np.float, shape=(20310,2)
│ ├── scales # dtype=np.float, shape=(20310,)
│ ├── video_ids # dtype=np.int32, shape=(20310,)
│ ├── video_names # dtype=np.uint8, shape=(20310,12) (note: string in ASCII format)
│ ├── list_keypoints_per_image # dtype=np.int32, shape=(12656,17)
│ └── list_single_person_per_image # dtype=np.int32, shape=(12656,1))
│
├── val01/
│ ├── activity_id # dtype=np.int32, shape=(8806,),
│ ├── activity_name # dtype=np.uint8, shape=(8806,101) (note: string in ASCII format)
│ ├── category_name # dtype=np.uint8, shape=(8806,23) (note: string in ASCII format)
│ ├── frame_sec # dtype=np.int32, shape=(8806,)
│ ├── head_bbox # dtype=np.float, shape=(8806,4)
│ ├── image_filenames # dtype=np.uint8, shape=(8806,21) (note: string in ASCII format)
│ ├── keypoint_labels # dtype=np.uint8, shape=(16,15) (note: string in ASCII format)
│ ├── keypoints # dtype=np.float, shape=(8806,16,3)
│ ├── object_fields # dtype=np.uint8, shape=(13,16) (note: string in ASCII format)
│ ├── object_ids # dtype=np.int32, shape=(8806,13)
│ ├── objpos # dtype=np.float, shape=(8806,2)
│ ├── scales # dtype=np.float, shape=(8806,)
│ ├── video_ids # dtype=np.int32, shape=(8806,)
│ ├── video_names # dtype=np.uint8, shape=(8806,12) (note: string in ASCII format)
│ ├── list_keypoints_per_image # dtype=np.int32, shape=(5423,17)
│ └── list_single_person_per_image # dtype=np.int32, shape=(5423,7))
│
└── test/
├── activity_id # dtype=np.int32, shape=(11776,),
├── activity_name # dtype=np.uint8, shape=(11776,101) (note: string in ASCII format)
├── category_name # dtype=np.uint8, shape=(11776,23) (note: string in ASCII format)
├── frame_sec # dtype=np.int32, shape=(11776,)
├── image_filenames # dtype=np.uint8, shape=(11776,21) (note: string in ASCII format)
├── keypoint_labels # dtype=np.uint8, shape=(16,15) (note: string in ASCII format)
├── object_fields # dtype=np.uint8, shape=(13,16) (note: string in ASCII format)
├── object_ids # dtype=np.int32, shape=(11776,13)
├── objpos # dtype=np.float, shape=(11776,2)
├── scales # dtype=np.float, shape=(11776,)
├── video_ids # dtype=np.int32, shape=(11776,)
├── video_names # dtype=np.uint8, shape=(11776,12) (note: string in ASCII format)
└── list_single_person_per_image # dtype=np.int32, shape=(6908,7))
Fields¶
activity_id
: activity idsavailable in
: train, train01, val01, testdtype
: np.int32is padded
: Falsefill value
: -1
activity_name
: activity namesavailable in
: train, train01, val01, testdtype
: np.uint8is padded
: Truefill value
: 0note
: strings stored in ASCII format
category_name
: category namesavailable in
: train, train01, val01, testdtype
: np.uint8is padded
: Truefill value
: 0note
: strings stored in ASCII format
frame_sec
: image position in video, in secondsavailable in
: train, train01, val01, testdtype
: np.int32is padded
: Falsefill value
: -1
head_bbox
: head bounding box coordinatesavailable in
: train, train01, val01dtype
: np.floatis padded
: Falsefill value
: -1note
: bbox format [x1,y1,x2,y2]
image_filenames
: image file name + pathavailable in
: train, train01, val01, testdtype
: np.uint8is padded
: Truefill value
: 0note
: strings stored in ASCII format
keypoint_labels
: body joint namesavailable in
: train, train01, val01, testdtype
: np.uint8is padded
: Truefill value
: 0note
: strings stored in ASCII format
keypoints
: body joint coordinates (x, y)available in
: train, train01, val01dtype
: np.floatis padded
: Falsefill value
: -1note
: keypoint format [x1, y1, is_visible]
object_fields
: list of field names of the object id listavailable in
: train, train01, val01, testdtype
: np.uint8is padded
: Truefill value
: 0note
: strings stored in ASCII formatnote
: key field (field name aggregator)
object_ids
: list of field idsavailable in
: train, train01, val01, testdtype
: np.int32is padded
: Falsefill value
: -1note
: key field (field id aggregator)
objpos
: person / detection center coordinatesavailable in
: train, train01, val01, testdtype
: np.floatis padded
: Falsefill value
: -1note
: position format [x, y]
scale
: person scale w.r.t. 200px heightavailable in
: train, train01, val01, testdtype
: np.floatis padded
: Falsefill value
: -1
video_id
: video indexavailable in
: train, train01, val01, testdtype
: np.int32is padded
: Truefill value
: -1
video_name
: video nameavailable in
: train, train01, val01, testdtype
: np.uint8is padded
: Truefill value
: 0note
: strings stored in ASCII format
list_keypoints_per_image
: list of available body joints ids per imageavailable in
: train, train01, val01dtype
: np.int32is padded
: Truefill value
: -1note
: pre-ordered list
list_single_person_per_image
: list of single person detection ids per imageavailable in
: train, train01, val01, testdtype
: np.int32is padded
: Truefill value
: -1note
: pre-ordered list