Skip to content

Commit ede8637

Browse files
committed
Update inceptionv3 benchmark README
1 parent d615ee9 commit ede8637

File tree

1 file changed

+84
-105
lines changed

1 file changed

+84
-105
lines changed

benchmarks/inceptionv3/README.md

+84-105
Original file line numberDiff line numberDiff line change
@@ -1,138 +1,117 @@
1-
# Inception-v3 speed: Raspberry Pi 3 vs 2013 MacBook Pro
1+
# Inception-v3 speed: Raspberry Pi 3
2+
3+
_Latest update: December 1, 2016; TensorFlow 0.11.0_
24

35
## About
46

5-
This file contains some very basic run-time statistics for [TensorFlow's pre-trained Inception-v3 model](https://www.tensorflow.org/versions/r0.7/tutorials/image_recognition/index.html) running on a [Raspberry Pi 3 Model B](https://www.raspberrypi.org/products/raspberry-pi-3-model-b/) as compared to an [Early 2013 15 inch Retina MacBook Pro](https://support.apple.com/kb/SP669?locale=en_US).
7+
This file contains some very basic run-time statistics for [TensorFlow's pre-trained Inception-v3 model](https://www.tensorflow.org/versions/r0.7/tutorials/image_recognition/index.html) running on a [Raspberry Pi 3 Model B](https://www.raspberrypi.org/products/raspberry-pi-3-model-b/) as compared to an [Early 2013 15 inch Retina MacBook Pro](https://support.apple.com/kb/SP669?locale=en_US) with an Intel i7-3740QM CPU as well as a desktop rig running Ubuntu 14.04 with a Titan X Maxwell GPU and Intel i7-5820K CPU.
68

7-
Out of the box, Inception-v3 is available to run from either a Python script [(tensorflow/models/image/imagenet/classify_image.py)](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/models/image/imagenet) or as a compiled C++ binary [(tensorflow/examples/label_image/main.cc)](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/label_image). To get the rough benchmarks used in this file, I made minor modifications in both files to print out run-time information after processing. The modified files are available in this directory: [classify\_image\_timed.py](classify_image_timed.py) and [main.cc](main.cc) Both tests used the default grace_hopper.jpg image used in the Inception-v3 C++ file.
9+
To run this benchmark, I use a modified version of the example [classify_image.py script](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/models/image/imagenet). I made minor modifications to collect and print out run-time information after processing. The modified file is available here: [classify\_image\_timed.py](classify_image_timed.py).
810

911
## Summary
1012

11-
* _Build_ refers to the amount of time it took to load and build the Inception-v3 graph from storage
12-
* _Eval_ refers to the amount of time it took to classify the image once it was loaded into memory
13-
* _Total_ is the sum of _Build_ and _Eval_
13+
* _warmup_runs_ refers to the number of calls to `Session.run` before starting the benchmarking in order to "warmup" the model. TensorFlow makes adjustments on the fly, so the first few times running the model are slower than subsequent runs
14+
* A _run_ is the time between the start of a call to `Session.run` and when it returns. We list the best, worst, and average time (averaged over 25 runs)
15+
* _Build_ is the amount of time spent constructing the Inception model from the protobuf file.
1416

1517
<table>
1618

1719
<tr>
18-
<td></td>
19-
<th colspan="3">Python (CPU)</th>
20-
<th colspan="3">C++ (CPU)</th>
20+
<th colspan="6">_TensorFlow version 0.11.0_</th>
2121
</tr>
22-
22+
2323
<tr>
24-
<td><i>Model</i></td>
25-
<td><i>Build (sec)</i></td>
26-
<td><i>Eval (sec)</i></td>
27-
<td><i>Total (sec)</i></td>
28-
<td><i>Build (sec)</i></td>
29-
<td><i>Eval (sec)</i></td>
30-
<td><i>Total (sec)</i></td>
24+
<td></td>
25+
<th><i>Model</i></th>
26+
<th><i>Best run (sec)</i></th>
27+
<th><i>Worst run (sec)</i></th>
28+
<th><i>Average run (sec)</i></th>
29+
<th><i>Build time(sec)</i></th>
3130
</tr>
32-
31+
3332
<tr>
34-
<td>Raspberry Pi 3</td>
35-
<td>3.496</td>
36-
<td>11.004</td>
37-
<td>14.500</td>
38-
<td>0.436</td>
39-
<td>6.969</td>
40-
<td>7.405</td>
33+
<th rowspan="4"><b>warmup_runs=10</b></th>
34+
<td><b>Raspberry Pi 3</b></td>
35+
<td><b>1.8646</b></td>
36+
<td><b>2.1782</b></td>
37+
<td><b>1.9805</b></td>
38+
<td><b>4.8962</b></td>
4139
</tr>
42-
40+
4341
<tr>
44-
<td>2013 MacBook Pro</td>
45-
<td>0.747</td>
46-
<td>1.421</td>
47-
<td>2.168</td>
48-
<td>0.253</td>
49-
<td>6.036</td>
50-
<td>6.289</td>
42+
<td>Intel i7-3740QM (Early 2013 MacBook Pro)</td>
43+
<td>0.2146</td>
44+
<td>0.2425</td>
45+
<td>0.2272</td>
46+
<td>1.3104</td>
5147
</tr>
52-
48+
5349
<tr>
54-
<td>Time increase on Raspberry Pi</td>
55-
<td>4.68x</td>
56-
<td><b>7.744x</b></td>
57-
<td>6.688x</td>
58-
<td>1.723x</td>
59-
<td><b>1.155x</b></td>
60-
<td>1.177x</td>
50+
<td>Intel i7-5820K (Ubuntu 14.04)</td>
51+
<td>0.1397</td>
52+
<td>0.1730</td>
53+
<td>0.1567</td>
54+
<td>0.7064</td>
6155
</tr>
62-
63-
</table>
6456

65-
### Remarks
57+
<tr>
58+
<td>NVIDIA Titan X (Maxwell), Intel i7-5820K (Ubuntu 14.04)</td>
59+
<td>0.0240</td>
60+
<td>0.0290</td>
61+
<td>0.0259</td>
62+
<td>0.9566</td>
63+
</tr>
6664

67-
* The good-ish news: the RPi3 appears to achieve fair performance relative to the MacBook Pro when running the compiled C++ binary
68-
* The bad news: The Python version is **really** slow. From just this test, I can't tell if the Python bindings to C++ aren't working properly, but I think it's definitely worth looking into
69-
* Dan Brickley (@danbri) shared some results when [testing out a camera module on his Raspberry Pi 3](https://twitter.com/danbri/status/709903532216995842). Direct link to Gist [here](https://gist.githubusercontent.com/danbri/ee6323d78ca14e616e4e/raw/6f50a897a59cb25d6c5e8f43fdfb0392fe9945d8/gistfile1.txt)
70-
* Pete Warden (@petewarden) mentioned that the compiler [may not be using NEON](https://github.com/tensorflow/tensorflow/issues/445#issuecomment-196021885) on the Raspberry Pi 2 while attempting to build TensorFlow. While my tests did not take a minute to run, @danbri's results suggest similar performance to @petewarden's; this may be a first place to look for improvements
71-
* On Mac, the Python version appears to run _much_ faster than the C++ binary. I'm not quite sure how this happened. I'd like to test on other systems to see if the results hold.
72-
* During the first run of the C++ binary after booting the system, there was a noticable slowdown during the 'model building'
65+
<tr>
66+
<th rowspan="4"><b>warmup_runs=0</b></th>
67+
<td><b>Raspberry Pi 3</b></td>
68+
<td><b>1.8541</b></td>
69+
<td><b>6.3338</b></td>
70+
<td><b>2.0656</b></td>
71+
<td><b>4.9755</b></td>
72+
</tr>
7373

74-
## Outputs
74+
<tr>
75+
<td>Intel i7-3740QM (Early 2013 Retina MacBook Pro)</td>
76+
<td>0.2174</td>
77+
<td>1.3151</td>
78+
<td>0.2662</td>
79+
<td>1.2761</td>
80+
</tr>
7581

76-
### Python
82+
<tr>
83+
<td>Intel i7-5820K (Ubuntu 14.04)</td>
84+
<td>0.1435</td>
85+
<td>0.7027</td>
86+
<td>0.1750</td>
87+
<td>0.7103</td>
88+
</tr>
7789

78-
#### Raspberry Pi 3, Raspbian 8.0
79-
```
80-
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233)
81-
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859)
82-
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264)
83-
custard apple (score = 0.00141)
84-
earthstar (score = 0.00107)
85-
Build graph time: 3.495808
86-
Eval time: 11.004332
87-
```
90+
<tr>
91+
<td>NVIDIA Titan X (Maxwell), Intel i7-5820K (Ubuntu 14.04)</td>
92+
<td>0.0232</td>
93+
<td>1.5800</td>
94+
<td>0.0871</td>
95+
<td>0.7659</td>
96+
</tr>
8897

89-
#### Early 2013, 15-inch MacBook Pro (2.7 GHz Intel Core i7), OS X 10.11.1
98+
</table>
9099

91-
```
92-
giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89233)
93-
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00859)
94-
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00264)
95-
custard apple (score = 0.00141)
96-
earthstar (score = 0.00107)
97-
Build graph time: 0.746465
98-
Eval time: 1.421328
99-
```
100+
### Remarks
100101

101-
---
102+
* Test performance has gotten significantly better over the past several releases of TensorFlow, though running Inception on a Raspberry Pi still takes longer than a second when using Python
103+
* Warming up your `Session` is _crucial_. There have been many issues opened in this repo asking how to improve performance, so here's the number one thing to start with: keep your `Session` persistent to take advantage of automatic optimization tweaks.
104+
* Along the same lines: do _not_ simply call your Python script from bash every time you want to classify an image. It takes multiple seconds to rebuild the Inception graph from scratch, which can slow down your model by multiple times (this test doesn't include the time it takes to import `tensorflow`, which is another thing to benchmark...). This goes for pretty much any TensorFlow model you use- keep some sort of rudimentary server running that can respond to requests and utilize a live TensorFlow `Session`
105+
* Running the [TensorFlow benchmark tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/benchmark) shows sub-second (~500-600ms) average run times for the Raspberry Pi (I'll need to do another write-up with more details). Since this benchmark is run entirely in C++, we'd expect it to run faster than through Python. The question is whether or not all ~1.5 seconds of difference between these tests is entirely due to the communication layer between Python and the C++ core.
102106

103-
### C++
107+
## About `classify_image_timed.py`
104108

105-
#### Raspberry Pi 3, Raspbian 8.0
106-
```
107-
I tensorflow/examples/label_image/main.cc:210] military uniform (866): 0.647298
108-
I tensorflow/examples/label_image/main.cc:210] suit (794): 0.0477194
109-
I tensorflow/examples/label_image/main.cc:210] academic gown (896): 0.0232409
110-
I tensorflow/examples/label_image/main.cc:210] bow tie (817): 0.0157354
111-
I tensorflow/examples/label_image/main.cc:210] bolo tie (940): 0.0145024
112-
113-
# First time running after booting system:
114-
4450 milliseconds to build graph
115-
7005 milliseconds to evaluate image
116-
117-
# Subsequent time
118-
436 milliseconds to build graph
119-
6969 milliseconds to evaluate image
120-
```
109+
I add two additional flags to `classify_image_timed.py` which allow users to easily change the number of test runs (runs that will collect information), as well as the number of "warmup" runs used. Simply pass in a number to `--num_runs` or `--warmup_runs` when calling the script:
121110

122-
#### Early 2013, 15-inch MacBook Pro (2.7 GHz Intel Core i7), OS X 10.11.1
111+
```bash
112+
# Use a sample size of 100 runs
113+
$ python classify_image_timed.py --num_runs=100
123114

115+
# Don't include any warmup runs
116+
$ python classify_image_timed.py --warmup_runs=0
124117
```
125-
I tensorflow/examples/label_image/main.cc:210] military uniform (866): 0.647299
126-
I tensorflow/examples/label_image/main.cc:210] suit (794): 0.0477195
127-
I tensorflow/examples/label_image/main.cc:210] academic gown (896): 0.0232407
128-
I tensorflow/examples/label_image/main.cc:210] bow tie (817): 0.0157355
129-
I tensorflow/examples/label_image/main.cc:210] bolo tie (940): 0.0145023
130-
131-
# First running time after booting system:
132-
468 milliseconds to build graph
133-
6124 milliseconds to evaluate image
134-
135-
# Subsequent running times
136-
253 milliseconds to build graph
137-
6036 milliseconds to evaluate image
138-
```

0 commit comments

Comments
 (0)