Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
README.md		README.md

README.md

Revisiting ResNets: Improved Training and Scaling Strategies

Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

ResNet-RS is a family of simple ResNet architectures designed with improved training and scaling strategies that are 1.7x - 2.7x faster than EfficientNets on TPUv3 and 2.1x - 3.3x on V100 GPU.

Improved Scaling Strategies

The scaling strategies introduced in the paper are:

(1) Scale the depth if overfitting can be an issue. If not, scale the width.
(2) Scale image resolution slowly compared to prior works such as EfficientNet.

The improved scaling strategies also apply to other image classification architectures (e.g. EfficientNet).

Improved Training Strategies

The training strategy is a combination of multiple regularization and training techniques (see configs and Table 1 in the paper). These techniques are typically transferable to different architectures and to different tasks/datasets.

ImageNet Checkpoints

We release configs and checkpoints of the ResNet-RS model family trained on ImageNet in Tensorflow 1.

Model	Input Size	V100 Lat (s)	TPU Lat (ms)	Top-1 Accuracy	Download
ResNet-RS-50	160x160	0.31	70	78.8	config \| ckpt
ResNet-RS-101	160x160	0.48	120	80.3	config \| ckpt
ResNet-RS-101	192x192	0.70	170	81.2	config \| ckpt
ResNet-RS-152	192x192	0.99	240	82.0	config \| ckpt
ResNet-RS-152	224x224	1.48	320	82.2	config \| ckpt
ResNet-RS-152	256x256	1.76	410	83.0	config \| ckpt
ResNet-RS-200	256x256	2.86	570	83.4	config \| ckpt
ResNet-RS-270	256x256	3.76	780	83.8	config \| ckpt
ResNet-RS-350	256x256	4.72	1100	84.0	config \| ckpt
ResNet-RS-350	320x320	8.48	1630	84.2	config \| ckpt
ResNet-RS-420	320x320	10.16	2090	84.4	config \| ckpt

Benchmarking details:

Latencies on Tesla V100 GPUs are measured withfull precision (float32).
Latencies on TPUv3 are measured using bfloat16 precision.
All latencies are measured with an initial training batch size of 128 images, which is divided by 2 until it fits onto the accelerator.

Code and checkpoints are avaliable in Tensorflow 2 at the official Tensorflow Model Garden.

Citation

@article{bello2021revisiting,
  title={Revisiting ResNets: Improved Training and Scaling Strategies},
  author={Irwan Bello and William Fedus and Xianzhi Du and Ekin D. Cubuk and Aravind Srinivas and Tsung-Yi Lin and Jonathon Shlens and Barret Zoph},
  journal={arXiv preprint arXiv:2103.07579},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resnet_rs

resnet_rs

README.md

Revisiting ResNets: Improved Training and Scaling Strategies

Improved Scaling Strategies

Improved Training Strategies

ImageNet Checkpoints

Benchmarking details:

Citation

Files

resnet_rs

Directory actions

More options

Directory actions

More options

Latest commit

History

resnet_rs

Folders and files

parent directory

README.md

Revisiting ResNets: Improved Training and Scaling Strategies

Improved Scaling Strategies

Improved Training Strategies

ImageNet Checkpoints

Benchmarking details:

Citation