|
| 1 | +# Deep High-Resolution Representation Learning for Human Pose Estimation(to appear in CVPR2019) |
| 2 | + |
| 3 | +## Introduction |
| 4 | +This is an official pytorch implementation of [*Deep High-Resolution Representation Learning for Human Pose Estimation*](https://arxiv.org/). |
| 5 | +In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods **recover high-resolution representations from low-resolution representations** produced by a high-to-low resolution network. Instead, our proposed network **maintains high-resolution representations** through the whole process. |
| 6 | +We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks **in parallel**. We conduct **repeated multi-scale fusions** such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. </br> |
| 7 | + |
| 8 | + |
| 9 | +## Main Results |
| 10 | +### Results on MPII val |
| 11 | +| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1 | |
| 12 | +|--------------------|------|----------|-------|-------|------|------|-------|------|----------| |
| 13 | +| pose_resnet_50 | 96.4 | 95.3 | 89.0 | 83.2 | 88.4 | 84.0 | 79.6 | 88.5 | 34.0 | |
| 14 | +| pose_resnet_101 | 96.9 | 95.9 | 89.5 | 84.4 | 88.4 | 84.5 | 80.7 | 89.1 | 34.0 | |
| 15 | +| pose_resnet_152 | 97.0 | 95.9 | 90.0 | 85.0 | 89.2 | 85.3 | 81.3 | 89.6 | 35.0 | |
| 16 | +| **pose_hrnet_w32** | 97.1 | 95.9 | 90.7 | 86.5 | 89.1 | 87.0 | 83.5 | 90.4 | 37.7 | |
| 17 | + |
| 18 | +### Note: |
| 19 | +- Flip test is used. |
| 20 | +- Input size is 256x256 |
| 21 | +- pose_resnet_[50,101,152] is our previous work of [*Simple Baselines for Human Pose Estimation and Tracking*](http://openaccess.thecvf.com/content_ECCV_2018/html/Bin_Xiao_Simple_Baselines_for_ECCV_2018_paper.html) |
| 22 | + |
| 23 | +### Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset |
| 24 | +| Arch | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | |
| 25 | +|--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------| |
| 26 | +| pose_resnet_50 | 256x192 | 34.0M | 8.9 | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | |
| 27 | +| pose_resnet_50 | 384x288 | 34.0M | 20.0 | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | |
| 28 | +| pose_resnet_101 | 256x192 | 53.0M | 12.4 | 0.714 | 0.893 | 0.793 | 0.681 | 0.781 | 0.771 | 0.934 | 0.840 | 0.730 | 0.832 | |
| 29 | +| pose_resnet_101 | 384x288 | 53.0M | 27.9 | 0.736 | 0.896 | 0.803 | 0.699 | 0.811 | 0.791 | 0.936 | 0.851 | 0.745 | 0.858 | |
| 30 | +| pose_resnet_152 | 256x192 | 68.6M | 15.7 | 0.720 | 0.893 | 0.798 | 0.687 | 0.789 | 0.778 | 0.934 | 0.846 | 0.736 | 0.839 | |
| 31 | +| pose_resnet_152 | 384x288 | 68.6M | 35.3 | 0.743 | 0.896 | 0.811 | 0.705 | 0.816 | 0.797 | 0.937 | 0.858 | 0.751 | 0.863 | |
| 32 | +| **pose_hrnet_w32** | 256x192 | 28.5M | 7.1 | 0.744 | 0.905 | 0.819 | 0.708 | 0.810 | 0.798 | 0.942 | 0.865 | 0.757 | 0.858 | |
| 33 | +| **pose_hrnet_w32** | 384x288 | 28.5M | 16.0 | 0.758 | 0.906 | 0.825 | 0.720 | 0.827 | 0.809 | 0.943 | 0.869 | 0.767 | 0.871 | |
| 34 | +| **pose_hrnet_w48** | 256x192 | 63.6M | 14.6 | 0.751 | 0.906 | 0.822 | 0.715 | 0.818 | 0.804 | 0.943 | 0.867 | 0.762 | 0.864 | |
| 35 | +| **pose_hrnet_w48** | 384x288 | 63.6M | 32.9 | 0.763 | 0.908 | 0.829 | 0.723 | 0.834 | 0.812 | 0.942 | 0.871 | 0.767 | 0.876 | |
| 36 | + |
| 37 | +### Note: |
| 38 | +- Flip test is used. |
| 39 | +- Person detector has person AP of 56.4 on COCO val2017 dataset. |
| 40 | +- pose_resnet_[50,101,152] is our previous work of [*Simple Baselines for Human Pose Estimation and Tracking*](http://openaccess.thecvf.com/content_ECCV_2018/html/Bin_Xiao_Simple_Baselines_for_ECCV_2018_paper.html). |
| 41 | +- GFLOPs is for convolution and linear layers only. |
| 42 | + |
| 43 | + |
| 44 | +### Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset |
| 45 | +| Arch | Input size | #Params | GFLOPs | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | |
| 46 | +|--------------------|------------|---------|--------|-------|-------|--------|--------|--------|-------|-------|--------|--------|--------| |
| 47 | +| pose_resnet_152 | 384x288 | 68.6M | 35.3 | 0.737 | 0.919 | 0.828 | 0.713 | 0.800 | 0.790 | 0.952 | 0.856 | 0.748 | 0.849 | |
| 48 | +| **pose_hrnet_w48** | 384x288 | 63.6M | 32.9 | 0.755 | 0.925 | 0.833 | 0.719 | 0.815 | 0.805 | 0.957 | 0.874 | 0.763 | 0.863 | |
| 49 | +| **pose_hrnet_w48\*** | 384x288 | 63.6M | 32.9 | 0.770 | 0.927 | 0.845 | 0.734 | 0.831 | 0.820 | 0.960 | 0.886 | 0.778 | 0.877 | |
| 50 | + |
| 51 | +### Note: |
| 52 | +- Flip test is used. |
| 53 | +- Person detector has person AP of 60.9 on COCO test-dev2017 dataset. |
| 54 | +- pose_resnet_152 is our previous work of [*Simple Baselines for Human Pose Estimation and Tracking*](http://openaccess.thecvf.com/content_ECCV_2018/html/Bin_Xiao_Simple_Baselines_for_ECCV_2018_paper.html). |
| 55 | +- GFLOPs is for convolution and linear layers only. |
| 56 | +- pose_hrnet_w48\* means using additional data from [AI challenger](https://challenger.ai/dataset/keypoint) for training. |
| 57 | + |
| 58 | +## Environment |
| 59 | +The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested. |
| 60 | + |
| 61 | +## Quick start |
| 62 | +### Installation |
| 63 | +1. Install pytorch >= v1.0.0 following [official instruction](https://pytorch.org/). |
| 64 | + **Note that if you use pytorch's version < v1.0.0, you should following the instruction at <https://github.com/Microsoft/human-pose-estimation.pytorch> to disable cudnn's implementations of BatchNorm layer. We encourage you to use higher pytorch's version(>=v1.0.0)** |
| 65 | +2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}. |
| 66 | +3. Install dependencies: |
| 67 | + ``` |
| 68 | + pip install -r requirements.txt |
| 69 | + ``` |
| 70 | +4. Make libs: |
| 71 | + ``` |
| 72 | + cd ${POSE_ROOT}/lib |
| 73 | + make |
| 74 | + ``` |
| 75 | +5. Install [COCOAPI](https://github.com/cocodataset/cocoapi): |
| 76 | + ``` |
| 77 | + # COCOAPI=/path/to/clone/cocoapi |
| 78 | + git clone https://github.com/cocodataset/cocoapi.git $COCOAPI |
| 79 | + cd $COCOAPI/PythonAPI |
| 80 | + # Install into global site-packages |
| 81 | + make install |
| 82 | + # Alternatively, if you do not have permissions or prefer |
| 83 | + # not to install the COCO API into global site-packages |
| 84 | + python3 setup.py install --user |
| 85 | + ``` |
| 86 | + Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly. |
| 87 | +4. Init output(training model output directory) and log(tensorboard log directory) directory: |
| 88 | + |
| 89 | + ``` |
| 90 | + mkdir output |
| 91 | + mkdir log |
| 92 | + ``` |
| 93 | + |
| 94 | + Your directory tree should look like this: |
| 95 | + |
| 96 | + ``` |
| 97 | + ${POSE_ROOT} |
| 98 | + ├── data |
| 99 | + ├── experiments |
| 100 | + ├── lib |
| 101 | + ├── log |
| 102 | + ├── models |
| 103 | + ├── output |
| 104 | + ├── tools |
| 105 | + ├── README.md |
| 106 | + └── requirements.txt |
| 107 | + ``` |
| 108 | + |
| 109 | +6. Download pretrained models from our model zoo([GoogleDrive](https://drive.google.com/drive/folders/1hOTihvbyIxsm5ygDpbUuJ7O_tzv4oXjC?usp=sharing) or [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW231MH2krnmLq5kkQ)) |
| 110 | + ``` |
| 111 | + ${POSE_ROOT} |
| 112 | + `-- models |
| 113 | + `-- pytorch |
| 114 | + |-- imagenet |
| 115 | + | |-- hrnet_w32-36af842e.pth |
| 116 | + | |-- hrnet_w48-8ef0771d.pth |
| 117 | + | |-- resnet50-19c8e357.pth |
| 118 | + | |-- resnet101-5d3b4d8f.pth |
| 119 | + | `-- resnet152-b121ed2d.pth |
| 120 | + |-- pose_coco |
| 121 | + | |-- pose_hrnet_w32_256x192.pth |
| 122 | + | |-- pose_hrnet_w32_384x288.pth |
| 123 | + | |-- pose_hrnet_w48_256x192.pth |
| 124 | + | |-- pose_hrnet_w48_384x288.pth |
| 125 | + | |-- pose_resnet_101_256x192.pth |
| 126 | + | |-- pose_resnet_101_384x288.pth |
| 127 | + | |-- pose_resnet_152_256x192.pth |
| 128 | + | |-- pose_resnet_152_384x288.pth |
| 129 | + | |-- pose_resnet_50_256x192.pth |
| 130 | + | `-- pose_resnet_50_384x288.pth |
| 131 | + `-- pose_mpii |
| 132 | + |-- pose_hrnet_w32_256x256.pth |
| 133 | + |-- pose_hrnet_w48_256x256.pth |
| 134 | + |-- pose_resnet_101_256x256.pth |
| 135 | + |-- pose_resnet_152_256x256.pth |
| 136 | + `-- pose_resnet_50_256x256.pth |
| 137 | +
|
| 138 | + ``` |
| 139 | + |
| 140 | +### Data preparation |
| 141 | +**For MPII data**, please download from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/). The original annotation files are in matlab format. We have converted them into json format, you also need to download them from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blW00SqrairNetmeVu4) or [GoogleDrive](https://drive.google.com/drive/folders/1En_VqmStnsXMdldXA6qpqEyDQulnmS3a?usp=sharing). |
| 142 | +Extract them under {POSE_ROOT}/data, and make them look like this: |
| 143 | +``` |
| 144 | +${POSE_ROOT} |
| 145 | +|-- data |
| 146 | +`-- |-- mpii |
| 147 | + `-- |-- annot |
| 148 | + | |-- gt_valid.mat |
| 149 | + | |-- test.json |
| 150 | + | |-- train.json |
| 151 | + | |-- trainval.json |
| 152 | + | `-- valid.json |
| 153 | + `-- images |
| 154 | + |-- 000001163.jpg |
| 155 | + |-- 000003072.jpg |
| 156 | +``` |
| 157 | + |
| 158 | +**For COCO data**, please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing). |
| 159 | +Download and extract them under {POSE_ROOT}/data, and make them look like this: |
| 160 | +``` |
| 161 | +${POSE_ROOT} |
| 162 | +|-- data |
| 163 | +`-- |-- coco |
| 164 | + `-- |-- annotations |
| 165 | + | |-- person_keypoints_train2017.json |
| 166 | + | `-- person_keypoints_val2017.json |
| 167 | + |-- person_detection_results |
| 168 | + | |-- COCO_val2017_detections_AP_H_56_person.json |
| 169 | + `-- images |
| 170 | + |-- train2017 |
| 171 | + | |-- 000000000009.jpg |
| 172 | + | |-- 000000000025.jpg |
| 173 | + | |-- 000000000030.jpg |
| 174 | + | |-- ... |
| 175 | + `-- val2017 |
| 176 | + |-- 000000000139.jpg |
| 177 | + |-- 000000000285.jpg |
| 178 | + |-- 000000000632.jpg |
| 179 | + |-- ... |
| 180 | +``` |
| 181 | + |
| 182 | +### Training and Testing |
| 183 | + |
| 184 | +#### Testing on MPII dataset using model zoo's models([OneDrive] or [GoogleDrive]) |
| 185 | + |
| 186 | +``` |
| 187 | +python tools/test.py \ |
| 188 | + --cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml \ |
| 189 | + TEST.MODEL_FILE models/pytorch/pose_mpii/pose_hrnet_w32_256x256.pth |
| 190 | +``` |
| 191 | + |
| 192 | +#### Training on MPII dataset |
| 193 | + |
| 194 | +``` |
| 195 | +python tools/train.py \ |
| 196 | + --cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml |
| 197 | +``` |
| 198 | + |
| 199 | +#### Testing on COCO val2017 dataset using model zoo's models |
| 200 | + |
| 201 | +``` |
| 202 | +python tools/test.py \ |
| 203 | + --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \ |
| 204 | + TEST.MODEL_FILE models/pytorch/pose_coco/pose_hrnet_w32_256x192.pth \ |
| 205 | + TEST.USE_GT_BBOX False |
| 206 | +``` |
| 207 | + |
| 208 | +#### Training on COCO train2017 dataset |
| 209 | + |
| 210 | +``` |
| 211 | +python tools/train.py \ |
| 212 | + --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \ |
| 213 | +``` |
| 214 | + |
| 215 | +### Citation |
| 216 | +If you use our code or models in your research, please cite with: |
| 217 | +``` |
| 218 | +@inproceedings{SunXLWang2019, |
| 219 | + title={Deep High-Resolution Representation Learning for Human Pose Estimation}, |
| 220 | + author={Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang}, |
| 221 | + booktitle={CVPR}, |
| 222 | + year={2019} |
| 223 | +} |
| 224 | +
|
| 225 | +@inproceedings{xiao2018simple, |
| 226 | + author={Xiao, Bin and Wu, Haiping and Wei, Yichen}, |
| 227 | + title={Simple Baselines for Human Pose Estimation and Tracking}, |
| 228 | + booktitle = {European Conference on Computer Vision (ECCV)}, |
| 229 | + year = {2018} |
| 230 | +} |
| 231 | +``` |
0 commit comments