|
45 | 45 | "### Argument parser\n",
|
46 | 46 | "\n",
|
47 | 47 | "* It is a good practice to use argument parsers for specifying hyperparameters. Argument parsers allow you to call a training like `python train.py --learning ... --seed ... --hidden_size ...` etc. \n",
|
48 |
| - "* If you have multiple models to choose from, you will have multiple set of hyperparameters. A good summary on that can be found in the [PyTorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/hyperparameters.html#argparser-best-practices) without the need of using Lightning. In essence, you can define a static method for each model that returns a parser for its specific hyperparameters. This makes your code cleaner and easier to define new tasks without copying the whole argument parser.\n", |
| 48 | + "* If you have multiple models to choose from, you will have multiple set of hyperparameters. A good summary on that can be found in the [PyTorch Lightning documentation](https://pytorch-lightning.readthedocs.io/en/latest/common/hyperparameters.html#argparser-best-practices) without the need of using Lightning. In essence, you can define a static method for each model that returns a parser for its specific hyperparameters. This makes your code cleaner and easier to define new tasks without copying the whole argument parser.\n", |
49 | 49 | "* To ensure reproducibility (more details below), it is recommended to save the arguments as a json file or similar in your checkpoint folder."
|
50 | 50 | ]
|
51 | 51 | },
|
|
63 | 63 | "### Toolkits\n",
|
64 | 64 | "\n",
|
65 | 65 | "* PyTorch Lightning provides a lot of useful tricks and toolkits on hyperparameter searching, such as:\n",
|
66 |
| - " * [Learning rate finder](https://pytorch-lightning.readthedocs.io/en/latest/lr_finder.html) that plots the learning rate vs loss for a few initial batches, and helps you to choose a reasonable learning rate.\n", |
67 |
| - " * [Autoscaling batch sizes](https://pytorch-lightning.readthedocs.io/en/latest/training_tricks.html#auto-scaling-of-batch-size) which finds the largest possible batch size given your GPU (helpful if you have very deep, large models, and it is obvious you need the largest batch size possible).\n", |
| 66 | + " * [Learning rate finder](https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html?highlight=Learning%20rate%20finder#learning-rate-finder) that plots the learning rate vs loss for a few initial batches, and helps you to choose a reasonable learning rate.\n", |
| 67 | + " * [Autoscaling batch sizes](https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html#batch-size-finder) which finds the largest possible batch size given your GPU (helpful if you have very deep, large models, and it is obvious you need the largest batch size possible).\n", |
68 | 68 | "* For comparing multiple hyperparameter configurations, you can add them to TensorBoard. This is a clean way of comparing multiple runs. If interested, a blog on this can be found [here](https://towardsdatascience.com/a-complete-guide-to-using-tensorboard-with-pytorch-53cb2301e8c3).\n",
|
69 | 69 | "* There are multiple libraries that support you in automatic hyperparameter search. A good overview for those in PyTorch can be found [here](https://medium.com/pytorch/accelerate-your-hyperparameter-optimization-with-pytorchs-ecosystem-tools-bc17001b9a49).\n",
|
70 | 70 | "\n",
|
|
87 | 87 | "* The learning rate is an important parameter, which depends on the optimizer, the model, and many more other hyperparameters.\n",
|
88 | 88 | "* A usual good starting point is 0.1 for SGD, and 1e-3 for Adam.\n",
|
89 | 89 | "* The deeper the model is, the lower the learning rate usually should be. For instance, Transformer models usually apply learning rates of 1e-5 to 1e-4 for Adam.\n",
|
90 |
| - "* The lower your batch, the lower the learning rate should be. Consider using [gradient accumulation](https://towardsdatascience.com/what-is-gradient-accumulation-in-deep-learning-ec034122cfa) if your batch size is getting too small (PyTorch Lightning supports this, see [here](https://pytorch-lightning.readthedocs.io/en/latest/training_tricks.html#accumulate-gradients)). \n", |
91 |
| - "* Consider using the PyTorch Lightning [learning rate finder](https://pytorch-lightning.readthedocs.io/en/latest/lr_finder.html) toolkit for an initial good guess. \n", |
| 90 | + "* The lower your batch, the lower the learning rate should be. Consider using [gradient accumulation](https://towardsdatascience.com/what-is-gradient-accumulation-in-deep-learning-ec034122cfa) if your batch size is getting too small (PyTorch Lightning supports this, see [here](https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html#accumulate-gradients)). \n", |
| 91 | + "* Consider using the PyTorch Lightning [learning rate finder](https://pytorch-lightning.readthedocs.io/en/latest/advanced/training_tricks.html?highlight=Learning%20rate%20finder#learning-rate-finder) toolkit for an initial good guess. \n", |
92 | 92 | "\n",
|
93 | 93 | "#### LR scheduler\n",
|
94 | 94 | "\n",
|
|
117 | 117 | "\n",
|
118 | 118 | "### Grid search with SLURM \n",
|
119 | 119 | "\n",
|
120 |
| - "* SLURM supports you to do a grid search with [job arrays](https://help.rc.ufl.edu/doc/SLURM_Job_Arrays). We have discussed job arrays in the [Lisa guide](https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial1/Lisa_Cluster.html#Job-Arrays).\n", |
| 120 | + "* SLURM supports you to do a grid search with [job arrays](https://help.rc.ufl.edu/doc/SLURM_Job_Arrays). We have discussed job arrays in the [Lisa guide](https://uvadlc-notebooks.readthedocs.io/en/latest/common/tutorial_notebooks/tutorial1/Lisa_Cluster.html#Job-Arrays).\n", |
121 | 121 | "* Job arrays allow you to start N jobs in parallel, each running with slightly different settings.\n",
|
122 | 122 | "* It is effectively the same as creating N job files and calling N times `sbatch ...`, but this can become annoying and is messy at some point."
|
123 | 123 | ]
|
|
128 | 128 | "source": [
|
129 | 129 | "#### PyTorch Lightning\n",
|
130 | 130 | "\n",
|
131 |
| - "Writing the job arrays can be sometimes annoying, and hence it is adviced to write a script that can automatically generate the hyperparameter files if you have to do this often enough (for instance, by adding the seed parameter 4 times to each other hyperparam config). However, if you are using PyTorch Lightning, you can directly create a job array file. The documentation for this can be found [here](https://pytorch-lightning.readthedocs.io/en/latest/slurm.html#building-slurm-scripts)." |
| 131 | + "Writing the job arrays can be sometimes annoying, and hence it is adviced to write a script that can automatically generate the hyperparameter files if you have to do this often enough (for instance, by adding the seed parameter 4 times to each other hyperparam config). However, if you are using PyTorch Lightning, you can directly create a job array file. The documentation for this can be found [here](https://pytorch-lightning.readthedocs.io/en/latest/common/slurm.html#building-slurm-scripts)." |
132 | 132 | ]
|
133 | 133 | },
|
134 | 134 | {
|
|
0 commit comments