|
| 1 | +# Tensorflow ROCm port # |
| 2 | + |
| 3 | +## Introduction ## |
| 4 | + |
| 5 | +This repository hosts the port of [Tensorflow](https://github.com/tensorflow/tensorflow) on ROCm platform. It uses various technologies on ROCm platform such as HIP and MIOpen. For details on HIP, please refer [here](https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP). Optimized DNN library calls (via [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen)) are also supported within this codebase. |
| 6 | + |
| 7 | +## Installation ## |
| 8 | + |
| 9 | +For further background information on ROCm, refer [here](https://github.com/RadeonOpenCompute/ROCm/blob/master/README.md). |
| 10 | + |
| 11 | +The project is derived from TensorFlow 1.3.0 and has been verified to work with the latest ROCm 1.7.1 release. |
| 12 | + |
| 13 | +For details on installation and usage, see these links: |
| 14 | +* [Basic installation](rocm_docs/tensorflow-install-basic.md) |
| 15 | +* [Building from source](rocm_docs/tensorflow-build-from-source.md) |
| 16 | +* [Quickstart guide](rocm_docs/tensorflow-quickstart.md) |
| 17 | + |
| 18 | + |
| 19 | +## Technical details ## |
| 20 | +* [Overview of ROCm port](rocm_docs/rocm-port-overview.md) |
| 21 | +* [List of supported operators on ROCm](rocm_docs/core_kernels.md) |
| 22 | + |
| 23 | + |
| 24 | +## Known Issues / Workarounds |
| 25 | + |
| 26 | +### X freezes under load |
| 27 | +ROCm 1.7.1 a kernel parameter `noretry` has been set to 1 to improve overall system performance. However it has been proven to bring instability to graphics driver shipped with Ubuntu. This is an ongoing issue and we are looking into it. |
| 28 | + |
| 29 | +Before that, please try apply this change by changing `noretry` bit to 0. |
| 30 | + |
| 31 | +``` |
| 32 | +echo 0 | sudo tee /sys/module/amdkfd/parameters/noretry |
| 33 | +``` |
| 34 | + |
| 35 | +Files under `/sys` won't be preserved after reboot so you'll need to do it every time. |
| 36 | + |
| 37 | +One way to keep `noretry=0` is to change `/etc/modprobe.d/amdkfd.conf` and make it be: |
| 38 | + |
| 39 | +``` |
| 40 | +options amdkfd noretry=0 |
| 41 | +``` |
| 42 | + |
| 43 | +Once it's done, run `sudo update-initramfs -u`. Reboot and verify `/sys/module/amdkfd/parameters/noretry` stays as 0. |
| 44 | + |
| 45 | +For more information please check this [issue report](https://github.com/ROCmSoftwarePlatform/tensorflow/issues/13) |
| 46 | + |
| 47 | +### tensorflow/benchmarks workaround for TF v1.3 |
| 48 | +Since our current port of TF supports v1.3, we can't use some of the newest commits in `tensorflow/benchmarks`. One specific error you could observe is a `ImportError: cannot import name batching`. "Batching" was introduced in this [commit](https://github.com/tensorflow/benchmarks/commit/82dd0539c76afa8491e50d8f796e686b4d97b988). Also RCCL, a ROCm version of NCCL, is under implementation. Therefore we have to drop back to an earlier point in the commit history, and disable NCCL. |
| 49 | + |
| 50 | +``` |
| 51 | +git checkout -b sep7 6a33b4a4b5bda950bb7e45faf13120115cbfdb2f |
| 52 | +sed -i 's|from tensorflow.contrib import nccl|#from tensorflow.contrib import nccl|g' ./scripts/tf_cnn_benchmarks/variable_mgr.py |
| 53 | +``` |
0 commit comments