Skip to content

Commit 2ad80f2

Browse files
author
Bin Xiao
committed
init
1 parent 9a3b4fe commit 2ad80f2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+12850
-0
lines changed

.gitignore

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# IntelliJ project files
2+
.idea
3+
*.iml
4+
out
5+
gen
6+
7+
### Vim template
8+
[._]*.s[a-w][a-z]
9+
[._]s[a-w][a-z]
10+
*.un~
11+
Session.vim
12+
.netrwhist
13+
*~
14+
15+
### IPythonNotebook template
16+
# Temporary data
17+
.ipynb_checkpoints/
18+
19+
### Python template
20+
# Byte-compiled / optimized / DLL files
21+
__pycache__/
22+
*.py[cod]
23+
*$py.class
24+
25+
# C extensions
26+
*.so
27+
28+
# Distribution / packaging
29+
.Python
30+
env/
31+
build/
32+
develop-eggs/
33+
dist/
34+
downloads/
35+
eggs/
36+
.eggs/
37+
#lib/
38+
#lib64/
39+
parts/
40+
sdist/
41+
var/
42+
*.egg-info/
43+
.installed.cfg
44+
*.egg
45+
46+
# PyInstaller
47+
# Usually these files are written by a python script from a template
48+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
49+
*.manifest
50+
*.spec
51+
52+
# Installer logs
53+
pip-log.txt
54+
pip-delete-this-directory.txt
55+
56+
# Unit test / coverage reports
57+
htmlcov/
58+
.tox/
59+
.coverage
60+
.coverage.*
61+
.cache
62+
nosetests.xml
63+
coverage.xml
64+
*,cover
65+
66+
# Translations
67+
*.mo
68+
*.pot
69+
70+
# Django stuff:
71+
*.log
72+
73+
# Sphinx documentation
74+
docs/_build/
75+
76+
# PyBuilder
77+
target/
78+
79+
*.ipynb
80+
*.params
81+
*.json
82+
.vscode/
83+
84+
lib/pycocotools/_mask.c
85+
lib/nms/cpu_nms.c
86+
87+
output/*
88+
models/*
89+
log/*
90+
data/*
91+
external/
92+
93+
draws/
94+
plot/
95+

README.md

+231
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# Deep High-Resolution Representation Learning for Human Pose Estimation(to appear in CVPR2019)
2+
3+
## Introduction
4+
This is an official pytorch implementation of [*Deep High-Resolution Representation Learning for Human Pose Estimation*](https://arxiv.org/).
5+
In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods **recover high-resolution representations from low-resolution representations** produced by a high-to-low resolution network. Instead, our proposed network **maintains high-resolution representations** through the whole process.
6+
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks **in parallel**. We conduct **repeated multi-scale fusions** such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. </br>
7+
8+
![Illustrating the architecture of the proposed HRNet](/figures/hrnet.png)
9+
## Main Results
10+
### Results on MPII val
11+
| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1 |
12+
|--------------------|------|----------|-------|-------|------|------|-------|------|----------|
13+
| pose_resnet_50 | 96.4 | 95.3 | 89.0 | 83.2 | 88.4 | 84.0 | 79.6 | 88.5 | 34.0 |
14+
| pose_resnet_101 | 96.9 | 95.9 | 89.5 | 84.4 | 88.4 | 84.5 | 80.7 | 89.1 | 34.0 |
15+
| pose_resnet_152 | 97.0 | 95.9 | 90.0 | 85.0 | 89.2 | 85.3 | 81.3 | 89.6 | 35.0 |
16+
| **pose_hrnet_w32** | 97.1 | 95.9 | 90.7 | 86.5 | 89.1 | 87.0 | 83.5 | 90.4 | 37.7 |
17+
18+
### Note:
19+
- Flip test is used.
20+
- Input size is 256x256
21+
- pose_resnet_[50,101,152] is our previous work of [*Simple Baselines for Human Pose Estimation and Tracking*](http://openaccess.thecvf.com/content_ECCV_2018/html/Bin_Xiao_Simple_Baselines_for_ECCV_2018_paper.html)
22+
23+
### Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
24+
| Arch | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
25+
|--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------|
26+
| pose_resnet_50 | 256x192 | 34.0M | 8.9 | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 |
27+
| pose_resnet_50 | 384x288 | 34.0M | 20.0 | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 |
28+
| pose_resnet_101 | 256x192 | 53.0M | 12.4 | 0.714 | 0.893 | 0.793 | 0.681 | 0.781 | 0.771 | 0.934 | 0.840 | 0.730 | 0.832 |
29+
| pose_resnet_101 | 384x288 | 53.0M | 27.9 | 0.736 | 0.896 | 0.803 | 0.699 | 0.811 | 0.791 | 0.936 | 0.851 | 0.745 | 0.858 |
30+
| pose_resnet_152 | 256x192 | 68.6M | 15.7 | 0.720 | 0.893 | 0.798 | 0.687 | 0.789 | 0.778 | 0.934 | 0.846 | 0.736 | 0.839 |
31+
| pose_resnet_152 | 384x288 | 68.6M | 35.3 | 0.743 | 0.896 | 0.811 | 0.705 | 0.816 | 0.797 | 0.937 | 0.858 | 0.751 | 0.863 |
32+
| **pose_hrnet_w32** | 256x192 | 28.5M | 7.1 | 0.744 | 0.905 | 0.819 | 0.708 | 0.810 | 0.798 | 0.942 | 0.865 | 0.757 | 0.858 |
33+
| **pose_hrnet_w32** | 384x288 | 28.5M | 16.0 | 0.758 | 0.906 | 0.825 | 0.720 | 0.827 | 0.809 | 0.943 | 0.869 | 0.767 | 0.871 |
34+
| **pose_hrnet_w48** | 256x192 | 63.6M | 14.6 | 0.751 | 0.906 | 0.822 | 0.715 | 0.818 | 0.804 | 0.943 | 0.867 | 0.762 | 0.864 |
35+
| **pose_hrnet_w48** | 384x288 | 63.6M | 32.9 | 0.763 | 0.908 | 0.829 | 0.723 | 0.834 | 0.812 | 0.942 | 0.871 | 0.767 | 0.876 |
36+
37+
### Note:
38+
- Flip test is used.
39+
- Person detector has person AP of 56.4 on COCO val2017 dataset.
40+
- pose_resnet_[50,101,152] is our previous work of [*Simple Baselines for Human Pose Estimation and Tracking*](http://openaccess.thecvf.com/content_ECCV_2018/html/Bin_Xiao_Simple_Baselines_for_ECCV_2018_paper.html).
41+
- GFLOPs is for convolution and linear layers only.
42+
43+
44+
### Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset
45+
| Arch | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
46+
|--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------|
47+
| pose_resnet_152 | 384x288 | 68.6M | 35.3 | 0.737 | 0.919 | 0.828 | 0.713 | 0.800 | 0.790 | 0.952 | 0.856 | 0.748 | 0.849 |
48+
| **pose_hrnet_w48** | 384x288 | 63.6M | 32.9 | 0.755 | 0.925 | 0.833 | 0.719 | 0.815 | 0.805 | 0.957 | 0.874 | 0.763 | 0.863 |
49+
| **pose_hrnet_w48\*** | 384x288 | 63.6M | 32.9 | 0.770 | 0.927 | 0.845 | 0.734 | 0.831 | 0.820 | 0.960 | 0.886 | 0.778 | 0.877 |
50+
51+
### Note:
52+
- Flip test is used.
53+
- Person detector has person AP of 60.9 on COCO test-dev2017 dataset.
54+
- pose_resnet_152 is our previous work of [*Simple Baselines for Human Pose Estimation and Tracking*](http://openaccess.thecvf.com/content_ECCV_2018/html/Bin_Xiao_Simple_Baselines_for_ECCV_2018_paper.html).
55+
- GFLOPs is for convolution and linear layers only.
56+
- pose_hrnet_w48\* means using additional data from [AI challenger](https://challenger.ai/dataset/keypoint) for training.
57+
58+
## Environment
59+
The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.
60+
61+
## Quick start
62+
### Installation
63+
1. Install pytorch >= v1.0.0 following [official instruction](https://pytorch.org/).
64+
**Note that if you use pytorch's version < v1.0.0, you should following the instruction at <https://github.com/Microsoft/human-pose-estimation.pytorch> to disable cudnn's implementations of BatchNorm layer. We encourage you to use higher pytorch's version(>=v1.0.0)**
65+
2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
66+
3. Install dependencies:
67+
```
68+
pip install -r requirements.txt
69+
```
70+
4. Make libs:
71+
```
72+
cd ${POSE_ROOT}/lib
73+
make
74+
```
75+
5. Install [COCOAPI](https://github.com/cocodataset/cocoapi):
76+
```
77+
# COCOAPI=/path/to/clone/cocoapi
78+
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
79+
cd $COCOAPI/PythonAPI
80+
# Install into global site-packages
81+
make install
82+
# Alternatively, if you do not have permissions or prefer
83+
# not to install the COCO API into global site-packages
84+
python3 setup.py install --user
85+
```
86+
Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.
87+
4. Init output(training model output directory) and log(tensorboard log directory) directory:
88+
89+
```
90+
mkdir output
91+
mkdir log
92+
```
93+
94+
Your directory tree should look like this:
95+
96+
```
97+
${POSE_ROOT}
98+
├── data
99+
├── experiments
100+
├── lib
101+
├── log
102+
├── models
103+
├── output
104+
├── tools
105+
├── README.md
106+
└── requirements.txt
107+
```
108+
109+
6. Download pretrained models from our model zoo([GoogleDrive](https://drive.google.com/drive/folders/1hOTihvbyIxsm5ygDpbUuJ7O_tzv4oXjC?usp=sharing) or [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW231MH2krnmLq5kkQ))
110+
```
111+
${POSE_ROOT}
112+
`-- models
113+
`-- pytorch
114+
|-- imagenet
115+
| |-- hrnet_w32-36af842e.pth
116+
| |-- hrnet_w48-8ef0771d.pth
117+
| |-- resnet50-19c8e357.pth
118+
| |-- resnet101-5d3b4d8f.pth
119+
| `-- resnet152-b121ed2d.pth
120+
|-- pose_coco
121+
| |-- pose_hrnet_w32_256x192.pth
122+
| |-- pose_hrnet_w32_384x288.pth
123+
| |-- pose_hrnet_w48_256x192.pth
124+
| |-- pose_hrnet_w48_384x288.pth
125+
| |-- pose_resnet_101_256x192.pth
126+
| |-- pose_resnet_101_384x288.pth
127+
| |-- pose_resnet_152_256x192.pth
128+
| |-- pose_resnet_152_384x288.pth
129+
| |-- pose_resnet_50_256x192.pth
130+
| `-- pose_resnet_50_384x288.pth
131+
`-- pose_mpii
132+
|-- pose_hrnet_w32_256x256.pth
133+
|-- pose_hrnet_w48_256x256.pth
134+
|-- pose_resnet_101_256x256.pth
135+
|-- pose_resnet_152_256x256.pth
136+
`-- pose_resnet_50_256x256.pth
137+
138+
```
139+
140+
### Data preparation
141+
**For MPII data**, please download from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/). The original annotation files are in matlab format. We have converted them into json format, you also need to download them from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW00SqrairNetmeVu4) or [GoogleDrive](https://drive.google.com/drive/folders/1En_VqmStnsXMdldXA6qpqEyDQulnmS3a?usp=sharing).
142+
Extract them under {POSE_ROOT}/data, and make them look like this:
143+
```
144+
${POSE_ROOT}
145+
|-- data
146+
`-- |-- mpii
147+
`-- |-- annot
148+
| |-- gt_valid.mat
149+
| |-- test.json
150+
| |-- train.json
151+
| |-- trainval.json
152+
| `-- valid.json
153+
`-- images
154+
|-- 000001163.jpg
155+
|-- 000003072.jpg
156+
```
157+
158+
**For COCO data**, please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing).
159+
Download and extract them under {POSE_ROOT}/data, and make them look like this:
160+
```
161+
${POSE_ROOT}
162+
|-- data
163+
`-- |-- coco
164+
`-- |-- annotations
165+
| |-- person_keypoints_train2017.json
166+
| `-- person_keypoints_val2017.json
167+
|-- person_detection_results
168+
| |-- COCO_val2017_detections_AP_H_56_person.json
169+
`-- images
170+
|-- train2017
171+
| |-- 000000000009.jpg
172+
| |-- 000000000025.jpg
173+
| |-- 000000000030.jpg
174+
| |-- ...
175+
`-- val2017
176+
|-- 000000000139.jpg
177+
|-- 000000000285.jpg
178+
|-- 000000000632.jpg
179+
|-- ...
180+
```
181+
182+
### Training and Testing
183+
184+
#### Testing on MPII dataset using model zoo's models([OneDrive] or [GoogleDrive])
185+
186+
```
187+
python tools/test.py \
188+
--cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml \
189+
TEST.MODEL_FILE models/pytorch/pose_mpii/pose_hrnet_w32_256x256.pth
190+
```
191+
192+
#### Training on MPII dataset
193+
194+
```
195+
python tools/train.py \
196+
--cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml
197+
```
198+
199+
#### Testing on COCO val2017 dataset using model zoo's models
200+
201+
```
202+
python tools/test.py \
203+
--cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
204+
TEST.MODEL_FILE models/pytorch/pose_coco/pose_hrnet_w32_256x192.pth \
205+
TEST.USE_GT_BBOX False
206+
```
207+
208+
#### Training on COCO train2017 dataset
209+
210+
```
211+
python tools/train.py \
212+
--cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
213+
```
214+
215+
### Citation
216+
If you use our code or models in your research, please cite with:
217+
```
218+
@inproceedings{SunXLWang2019,
219+
title={Deep High-Resolution Representation Learning for Human Pose Estimation},
220+
author={Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang},
221+
booktitle={CVPR},
222+
year={2019}
223+
}
224+
225+
@inproceedings{xiao2018simple,
226+
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
227+
title={Simple Baselines for Human Pose Estimation and Tracking},
228+
booktitle = {European Conference on Computer Vision (ECCV)},
229+
year = {2018}
230+
}
231+
```

0 commit comments

Comments
 (0)