Skip to content

Commit 253e2d7

Browse files
committed
Add ROCm specific documentation
Ported from TensorFlow 1.3 ROCm port. Still needs to be revised to reflect the status of TensorFlow 1.8.
1 parent b3c7f4e commit 253e2d7

6 files changed

+1732
-0
lines changed

README.ROCm.md

+53
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Tensorflow ROCm port #
2+
3+
## Introduction ##
4+
5+
This repository hosts the port of [Tensorflow](https://github.com/tensorflow/tensorflow) on ROCm platform. It uses various technologies on ROCm platform such as HIP and MIOpen. For details on HIP, please refer [here](https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP). Optimized DNN library calls (via [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen)) are also supported within this codebase.
6+
7+
## Installation ##
8+
9+
For further background information on ROCm, refer [here](https://github.com/RadeonOpenCompute/ROCm/blob/master/README.md).
10+
11+
The project is derived from TensorFlow 1.3.0 and has been verified to work with the latest ROCm 1.7.1 release.
12+
13+
For details on installation and usage, see these links:
14+
* [Basic installation](rocm_docs/tensorflow-install-basic.md)
15+
* [Building from source](rocm_docs/tensorflow-build-from-source.md)
16+
* [Quickstart guide](rocm_docs/tensorflow-quickstart.md)
17+
18+
19+
## Technical details ##
20+
* [Overview of ROCm port](rocm_docs/rocm-port-overview.md)
21+
* [List of supported operators on ROCm](rocm_docs/core_kernels.md)
22+
23+
24+
## Known Issues / Workarounds
25+
26+
### X freezes under load
27+
ROCm 1.7.1 a kernel parameter `noretry` has been set to 1 to improve overall system performance. However it has been proven to bring instability to graphics driver shipped with Ubuntu. This is an ongoing issue and we are looking into it.
28+
29+
Before that, please try apply this change by changing `noretry` bit to 0.
30+
31+
```
32+
echo 0 | sudo tee /sys/module/amdkfd/parameters/noretry
33+
```
34+
35+
Files under `/sys` won't be preserved after reboot so you'll need to do it every time.
36+
37+
One way to keep `noretry=0` is to change `/etc/modprobe.d/amdkfd.conf` and make it be:
38+
39+
```
40+
options amdkfd noretry=0
41+
```
42+
43+
Once it's done, run `sudo update-initramfs -u`. Reboot and verify `/sys/module/amdkfd/parameters/noretry` stays as 0.
44+
45+
For more information please check this [issue report](https://github.com/ROCmSoftwarePlatform/tensorflow/issues/13)
46+
47+
### tensorflow/benchmarks workaround for TF v1.3
48+
Since our current port of TF supports v1.3, we can't use some of the newest commits in `tensorflow/benchmarks`. One specific error you could observe is a `ImportError: cannot import name batching`. "Batching" was introduced in this [commit](https://github.com/tensorflow/benchmarks/commit/82dd0539c76afa8491e50d8f796e686b4d97b988). Also RCCL, a ROCm version of NCCL, is under implementation. Therefore we have to drop back to an earlier point in the commit history, and disable NCCL.
49+
50+
```
51+
git checkout -b sep7 6a33b4a4b5bda950bb7e45faf13120115cbfdb2f
52+
sed -i 's|from tensorflow.contrib import nccl|#from tensorflow.contrib import nccl|g' ./scripts/tf_cnn_benchmarks/variable_mgr.py
53+
```

0 commit comments

Comments
 (0)