Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multithreading, device, score, API key, timestamps #1

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 90 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,90 @@
*SYNOPSIS*
1. This script reads urls from 'pagespeed.txt' file. Load this file with full URLS.
2. Queries each url with the google pagespeed api.
3. Filters JSON results to only include desired metrics.
4. Metrics are saved to local .csv spreadsheet for analysis.
# Google Pagespeed API Bulk Query

This Python3 script queries Google's PageSpeed Insights for a list of URLs, then prints selected results and saves to CSV.

You can specify whether to test for Desktop or Mobile (it defaults to mobile). It is set to select only the performance Score, First Contentful Paint, and First Interactive values. You can easily change that.

## Install

This program requires Python 3. Assuming you have it, simply git clone or download this project and then run it from the command line.

## Use

### Setup

List all the URLs on a single line in a txt file named `pagespeed.txt`. Assuming you're analyzing a single large website, your `sitemap.xml` is a good place to get each URL you want the search engines to care about.

To avoid running afoul of Google's API rate limits, get an [API key from Google](https://console.developers.google.com/apis/credentials).

Best practice is to add the key to your bash profile if you're on Mac or Linux. For example:

```bash
$ nano ~/.bash_profile
```
and then add the following line:
```
export SPD_API_KEY=YOUR_API_KEY
```
Restart your terminal after you save it.

If you're not a naturally paranoid person, you're not sharing this program, and you're not committing it to any repositories, you can just put the key directly into `pagespeed-api.py` as `SPD_API_KEY`. This is a bad practice and I don't recommend it.

### Running it

From the project root directory, to get Mobile results:
```
$ python3 pagespeed-api.py
```
```
$ python3 pagespeed-api.py mobile
```
To get Desktop results:
```
$ python3 pagespeed-api.py desktop
```

You will have something like the following printed to your screen:
```
Requesting https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://www.example.com&strategy=mobile&key=YOUR_API_KEY...
Requesting https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://www.example.com&strategy=mobile&key=YOUR_API_KEY...
URL ~ https://www.example.com/
Score ~ 1.0
First Contentful Paint ~ 0.8 s
First Interactive ~ 0.8 s
URL ~ https://www.example.com/
Score ~ 1.0
First Contentful Paint ~ 0.8 s
First Interactive ~ 0.8 s
```
And you should have a file named `pagespeed-results-mobile-2019-08-21_23:33:59.csv` saved to the "results" directory. It will look like:

```
URL, Score, First Contentful Paint, First Interactive
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
```

## Credit / References

This is a fork of [ibebeebz pagespeed project](https://github.com/ibebeebz/google-pagespeed-api-script). Many thanks to ibebeebz!

### References

These were helpful to me today:

- [Guide to concurrency and parallelism](https://toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python) from Toptal that really helped me.
- Google's [PageSpeed API docs](https://developers.google.com/speed/docs/insights/v5/get-started)

### Fork differences

The main reason I forked this project was because it was taking quite a while to query hundreds of pages, and I wanted to do it several times a day for mobile and desktop.

So I added multithreading (most of the time spent is just waiting on Google's response), the ability to specify device, and stamping the csv output so it's unique.
117 changes: 68 additions & 49 deletions pagespeed-api.py
Original file line number Diff line number Diff line change
@@ -1,54 +1,73 @@
import requests
import sys
import os
from time import localtime, strftime
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path

# Documentation: https://developers.google.com/speed/docs/insights/v5/get-started

# JSON paths: https://developers.google.com/speed/docs/insights/v4/reference/pagespeedapi/runpagespeed
# Set in your bash profile, get from Google: https://console.developers.google.com/apis/credentials
SPD_API_KEY = os.environ.get('SPD_API_KEY')

# Populate 'pagespeed.txt' file with URLs to query against API.
with open('pagespeed.txt') as pagespeedurls:
download_dir = 'pagespeed-results.csv'
file = open(download_dir, 'w')
content = pagespeedurls.readlines()
content = [line.rstrip('\n') for line in content]

columnTitleRow = "URL, First Contentful Paint, First Interactive\n"
file.write(columnTitleRow)
# Documentation: https://developers.google.com/speed/docs/insights/v5/get-started
def main(strategy="mobile"):
try:
strategy = sys.argv[1]
except IndexError:
print("You can pass 'mobile' or 'desktop' as parameter. Running mobile by default.")
# Pull URLS from 'pagespeed.txt' to query against API.
with open('pagespeed.txt') as pagespeedurls:
stamp = strftime("%Y-%m-%d_at_%H.%M.%S", localtime())
csv_out = Path("results/")
download_dir = csv_out / f'{strategy}-{stamp}.csv'
file = open(download_dir, 'w')
content = pagespeedurls.readlines()
content = [line.rstrip('\n') for line in content]
columnTitleRow = "URL, Score, First Contentful Paint, First Interactive\n" # CSV header
file.write(columnTitleRow)

# This is the google pagespeed api url structure, using for loop to insert each url in .txt file
for line in content:
# If no "strategy" parameter is included, the query by default returns desktop data.
x = f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={line}&strategy=mobile'
print(f'Requesting {x}...')
r = requests.get(x)
final = r.json()

try:
urlid = final['id']
split = urlid.split('?') # This splits the absolute url from the api key parameter
urlid = split[0] # This reassigns urlid to the absolute url
ID = f'URL ~ {urlid}'
ID2 = str(urlid)
urlfcp = final['lighthouseResult']['audits']['first-contentful-paint']['displayValue']
FCP = f'First Contentful Paint ~ {str(urlfcp)}'
FCP2 = str(urlfcp)
urlfi = final['lighthouseResult']['audits']['interactive']['displayValue']
FI = f'First Interactive ~ {str(urlfi)}'
FI2 = str(urlfi)
except KeyError:
print(f'<KeyError> One or more keys not found {line}.')

try:
row = f'{ID2},{FCP2},{FI2}\n'
file.write(row)
except NameError:
print(f'<NameError> Failing because of KeyError {line}.')
file.write(f'<KeyError> & <NameError> Failing because of nonexistant Key ~ {line}.' + '\n')

try:
print(ID)
print(FCP)
print(FI)
except NameError:
print(f'<NameError> Failing because of KeyError {line}.')
def get_speed(line):
# Query API.
x = f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={line}&strategy={strategy}&key={SPD_API_KEY}'
print(f'Requesting {x}...')
r = requests.get(x)
final = r.json()

try:
urlid = final['id']
split = urlid.split('?') # This splits the absolute url from the api key parameter
urlid = split[0] # This reassigns urlid to the absolute url
ID = f'URL ~ {urlid}'
ID2 = str(urlid)
# JSON paths: https://developers.google.com/speed/docs/insights/v4/reference/pagespeedapi/runpagespeed
urlfcp = final['lighthouseResult']['audits']['first-contentful-paint']['displayValue']
FCP = f'First Contentful Paint ~ {str(urlfcp)}'
FCP2 = str(urlfcp[:-2])
urlfi = final['lighthouseResult']['audits']['interactive']['displayValue']
FI = f'First Interactive ~ {str(urlfi)}'
FI2 = str(urlfi[:-2])
urlscore = final['lighthouseResult']['categories']['performance']['score']
SC = f'Score ~ {str(urlscore)}'
SC2 = str(urlscore)
except KeyError:
print(f'<KeyError> One or more keys not found {line}.')

try:
row = f'{ID2},{SC2},{FCP2},{FI2}\n'
file.write(row)
except NameError:
print(f'<NameError> Failing because of KeyError {line}.')
file.write(f'<KeyError> & <NameError> Failing because of nonexistant Key ~ {line}.' + '\n')

try:
print(ID)
print(SC)
print(FCP)
print(FI)
except NameError:
print(f'<NameError> Failing because of KeyError {line}.')
with ThreadPoolExecutor() as executor: # Make multithreaded, 5x your processors by default
executor.map(get_speed, content)

file.close()
file.close()
if __name__ == '__main__':
main()
11 changes: 10 additions & 1 deletion pagespeed.txt
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
https://stores.uscellular.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
https://www.example.com
11 changes: 11 additions & 0 deletions results/pagespeed-results-desktop-2019-08-22_00:20:38.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
URL, Score, First Contentful Paint, First Interactive
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
https://www.example.com/,1.0,0.2 s,0.2 s
11 changes: 11 additions & 0 deletions results/pagespeed-results-mobile-2019-08-22_00:18:16.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
URL, Score, First Contentful Paint, First Interactive
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
<KeyError> & <NameError> Failing because of nonexistant Key ~ https://www.example.com.
11 changes: 11 additions & 0 deletions results/pagespeed-results-mobile-2019-08-22_00:19:56.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
URL, Score, First Contentful Paint, First Interactive
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s
https://www.example.com/,1.0,0.8 s,0.8 s