Skip to content

Latest commit

 

History

History

resnet_rs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Revisiting ResNets: Improved Training and Scaling Strategies

Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

ResNet-RS is a family of simple ResNet architectures designed with improved training and scaling strategies that are 1.7x - 2.7x faster than EfficientNets on TPUv3 and 2.1x - 3.3x on V100 GPU.

Improved Scaling Strategies

The scaling strategies introduced in the paper are:

  • (1) Scale the depth if overfitting can be an issue. If not, scale the width.
  • (2) Scale image resolution slowly compared to prior works such as EfficientNet.

The improved scaling strategies also apply to other image classification architectures (e.g. EfficientNet).

Improved Training Strategies

The training strategy is a combination of multiple regularization and training techniques (see configs and Table 1 in the paper). These techniques are typically transferable to different architectures and to different tasks/datasets.

ImageNet Checkpoints

We release configs and checkpoints of the ResNet-RS model family trained on ImageNet in Tensorflow 1.

Model Input Size V100 Lat (s) TPU Lat (ms) Top-1 Accuracy Download
ResNet-RS-50 160x160 0.31 70 78.8 config | ckpt
ResNet-RS-101 160x160 0.48 120 80.3 config | ckpt
ResNet-RS-101 192x192 0.70 170 81.2 config | ckpt
ResNet-RS-152 192x192 0.99 240 82.0 config | ckpt
ResNet-RS-152 224x224 1.48 320 82.2 config | ckpt
ResNet-RS-152 256x256 1.76 410 83.0 config | ckpt
ResNet-RS-200 256x256 2.86 570 83.4 config | ckpt
ResNet-RS-270 256x256 3.76 780 83.8 config | ckpt
ResNet-RS-350 256x256 4.72 1100 84.0 config | ckpt
ResNet-RS-350 320x320 8.48 1630 84.2 config | ckpt
ResNet-RS-420 320x320 10.16 2090 84.4 config | ckpt

Benchmarking details:

  • Latencies on Tesla V100 GPUs are measured withfull precision (float32).
  • Latencies on TPUv3 are measured using bfloat16 precision.
  • All latencies are measured with an initial training batch size of 128 images, which is divided by 2 until it fits onto the accelerator.

Code and checkpoints are avaliable in Tensorflow 2 at the official Tensorflow Model Garden.

Citation

@article{bello2021revisiting,
  title={Revisiting ResNets: Improved Training and Scaling Strategies},
  author={Irwan Bello and William Fedus and Xianzhi Du and Ekin D. Cubuk and Aravind Srinivas and Tsung-Yi Lin and Jonathon Shlens and Barret Zoph},
  journal={arXiv preprint arXiv:2103.07579},
  year={2021}
}