|
7 | 7 | "# Guide 1: Working with the Lisa cluster\n",
|
8 | 8 | "**Author:** Phillip Lippe\n",
|
9 | 9 | "\n",
|
10 |
| - "This tutorial explains how to work with the Lisa cluster for the Deep Learning course at the University of Amsterdam. Every student will receive an account to have resources for training deep neural networks and get familiar with working on a cluster. It is recommended to have listened to the presentation by the SURFsara team or the TA team before going through this tutorial.\n", |
| 10 | + "This tutorial explains how to work with the Lisa cluster for the Deep Learning course at the University of Amsterdam. Every student will receive an account to have resources for training deep neural networks and get familiar with working on a cluster. It is recommended to have listened to the presentation by the SURFsara team or the TA team before going through this tutorial. Further, this tutorial assumes that you are familiar with using the terminal in Linux. If not, a crash course can be found [here](https://servicedesk.surf.nl/wiki/display/WIKI/First+time+usage+of+Lisa).\n", |
11 | 11 | "\n",
|
12 | 12 | "## The Lisa cluster\n",
|
13 | 13 | "\n",
|
|
289 | 289 | "\n",
|
290 | 290 | "Typing your password every time you connect to Lisa can become annoying. To enable a safe, password-less connection, you can add your public ssh key to the [SURFsara user portal](https://portal.surfsara.nl/sshkeys/). Next time you login from your machine to Lisa, it will only check the ssh key and not ask you for the password anymore.\n",
|
291 | 291 | "\n",
|
| 292 | + "### Remote development with VSCode or PyCharm\n", |
| 293 | + "\n", |
| 294 | + "The common workflow with clusters is that you first code locally and test your implementation on short runs on the CPU or evt. a local GPU, then sync your code to the cluster (e.g. via git), and finally run the full training/process on a compute node of the cluster. If you prefer to directly code on Lisa, you can do so via remote connections in tools like VSCode or PyCharm. Essentially, these IDEs can connect via SSH to Lisa, so that it looks to you like you code locally, but all code you are editing is directly saved on Lisa. Note though that to run your code, you still need to create a job script and submit it via SLURM. The login nodes, on which these IDEs connect you to, are not meant for debugging or running code. Any process that takes longer than 15 minutes will be killed (this can sometimes also include the SSH connection of VSCode/PyCharm). For more details on remote development and how to set it up, see the SURFsara documentation on [VSCode](https://servicedesk.surf.nl/wiki/display/WIKI/Using+Visual+Studio+Code+for+remote+development) and [PyCharm](https://servicedesk.surf.nl/wiki/display/WIKI/Using+PyCharm+for+remote+development). \n", |
| 295 | + "\n", |
292 | 296 | "### Tracking GPU stats\n",
|
293 | 297 | "\n",
|
294 | 298 | "If you are curious whether you use the GPU to its full capacity, you can monitor its utilization as follows. First, you submit your job and check its job ID via `squeue -u [userid]` (with your user-ID/name) or the ID that has been printed out after submitting the job via `sbatch`. Next, you can log into the node via `slurm_jobmonitor [jobid]` where you need your job ID. This gives you an interactive view on the node. Finally, you can run `nvtop` to track the GPU utilization. More details can be found [here](https://servicedesk.surf.nl/wiki/display/WIKI/Lisa%3A+Monitor+your+GPU+job).\n",
|
|
393 | 397 | "|:---------------------:|:-----------:|:------------:|:-------------------------------:|\n",
|
394 | 398 | "| Gold 5118 (2.3GHz) | 24 | 192GB | 4x Titan RTX, 24 GB GDDR6 |\n",
|
395 | 399 | "\n",
|
396 |
| - "The Titan RTX are faster and provide more GPU memory than the 1080Ti's. However, these nodes cost more than twice as many credits (91.2 vs 42.1 for full node per hour, see [accounting](https://servicedesk.surf.nl/wiki/display/WIKI/Lisa+usage+and+accounting)), so use them only if needed. Furthermore, in many cases, you do not need a full node and might only need a single GPU. Similar to the course partition, there exist `gpu_titanrtx_shared` and `gpu_shared` which allow you access to partial node uses, like 1 or 2 GPUs. " |
| 400 | + "The Titan RTX are faster and provide more GPU memory than the 1080Ti's. However, these nodes cost more than twice as many credits (91.2 vs 42.1 for full node per hour, see [accounting](https://servicedesk.surf.nl/wiki/display/WIKI/Lisa+usage+and+accounting)), so use them only if needed. Furthermore, in many cases, you do not need a full node and might only need a single GPU. Similar to the course partition, there exist `gpu_titanrtx_shared` and `gpu_shared` which allow you access to partial node uses, like 1 or 2 GPUs. \n", |
| 401 | + "\n", |
| 402 | + "If you are in the need of pure CPU-based jobs, you can use the `shared` and `normal` partitions." |
| 403 | + ] |
| 404 | + }, |
| 405 | + { |
| 406 | + "cell_type": "markdown", |
| 407 | + "metadata": {}, |
| 408 | + "source": [ |
| 409 | + "### Additional links\n", |
| 410 | + "\n", |
| 411 | + "Many more details on Lisa, SLURM, etc. can be found on the SURFSara wiki, as well as a different perspective on the aspects we have discussed in this tutorial. A (non-exclusive) list of useful links:\n", |
| 412 | + "\n", |
| 413 | + "* [A crash course on Linux commands that you can run on Lisa](https://servicedesk.surf.nl/wiki/display/WIKI/First+time+usage+of+Lisa)\n", |
| 414 | + "* [How to use SSH on Windows/Mac/Linux, authenticate with SSH keys instead of passwords, and transfer files with SCP or FTP](https://servicedesk.surf.nl/wiki/pages/viewpage.action?pageId=30660216)\n", |
| 415 | + "* [How to write your own job script](https://servicedesk.surf.nl/wiki/display/WIKI/Writing+a+job+script)\n", |
| 416 | + "* [How to submit/cancel a job](https://servicedesk.surf.nl/wiki/display/WIKI/Interacting+with+the+job+queue)\n", |
| 417 | + "* [How to see a job's status in the queue](https://servicedesk.surf.nl/wiki/display/WIKI/Monitoring+the+queue)\n", |
| 418 | + "* How to do remote development with [VSCode](https://servicedesk.surf.nl/wiki/display/WIKI/Using+Visual+Studio+Code+for+remote+development) and [PyCharm](https://servicedesk.surf.nl/wiki/display/WIKI/Using+PyCharm+for+remote+development)\n", |
| 419 | + "* [Lisa paritions, hardware, and file system](https://servicedesk.surf.nl/wiki/display/WIKI/Lisa+hardware+and+file+systems)\n", |
| 420 | + "* [Lisa file system sharing](https://servicedesk.surf.nl/wiki/pages/viewpage.action?pageId=30660238)\n", |
| 421 | + "* [Lisa usage and accounting](https://servicedesk.surf.nl/wiki/display/WIKI/Lisa+usage+and+accounting)" |
397 | 422 | ]
|
398 | 423 | }
|
399 | 424 | ],
|
|
413 | 438 | "name": "python",
|
414 | 439 | "nbconvert_exporter": "python",
|
415 | 440 | "pygments_lexer": "ipython3",
|
416 |
| - "version": "3.10.6" |
| 441 | + "version": "3.8.2" |
417 | 442 | }
|
418 | 443 | },
|
419 | 444 | "nbformat": 4,
|
|
0 commit comments