Skip to content

Commit 308da47

Browse files
author
Adam M. Wilson
committed
update parallel
1 parent cb96fe6 commit 308da47

File tree

832 files changed

+44351
-84551
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

832 files changed

+44351
-84551
lines changed

CS_11_ParallelProcessing.Rmd

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
title: "Parallel Computing with R"
3+
subtitle: Write a parallel for loop
4+
week: 11
5+
type: Case Study
6+
reading:
7+
- CRAN Task View [High-Performance and Parallel Computing with R](http://cran.r-project.org/web/views/HighPerformanceComputing.html)
8+
- Parallel [Computing with the R Language in a Supercomputing Environment](https://link.springer.com/chapter/10.1007/978-3-642-13872-0_64)
9+
tasks:
10+
- Write parallel for loops to speed up computation time.
11+
---
12+
13+
```{r setup, include=FALSE, purl=F}
14+
source("functions.R")
15+
source("knitr_header.R")
16+
```
17+
18+
# Reading
19+
20+
```{r reading,results='asis',echo=F,purl=F}
21+
md_bullet(rmarkdown::metadata$reading)
22+
```
23+
24+
25+
# Tasks
26+
27+
```{r tasks,results='asis',echo=F, purl=F}
28+
md_bullet(rmarkdown::metadata$tasks)
29+
```
30+
31+
## Background
32+
33+
```{r cache=F, message=F,warning=FALSE}
34+
library(tidyverse)
35+
library(spData)
36+
library(sf)
37+
38+
## New Packages
39+
library(foreach)
40+
library(doParallel)
41+
registerDoParallel()
42+
getDoParWorkers() # check registered cores
43+
```
44+
45+
46+
Write an Rmd script that:
47+
48+
* Loads the `world` dataset in the `spData` package
49+
* Runs a parallel `foreach()` to loop over countries (`name_long`) and:
50+
* `filter` the world object to include only on country at a time.
51+
* use `st_is_within_distance` to find the distance from that country to all other countries in the `world` object within 100000m Set `sparse=F` to return a simple array of `T` for countries within the distance.
52+
* set `.combine=rbind` to return a simple matrix.
53+
* Confirm that you get the same answer without using foreach:
54+
* imply use `st_is_within_distance` with the transformed `world` object as both `x` and `y` object.
55+
* compare the results with `identical()`
56+
* you can also check the time difference with `system.time()`.
57+
58+
```{r, echo=F, purl=F}
59+
data("world")
60+
proj="+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs "
61+
dist=100000 # distance in m
62+
world2=st_transform(world,proj)
63+
64+
#system.time(
65+
x_seq<-world2%>%
66+
st_is_within_distance(world2,dist,sparse=F)
67+
#)
68+
69+
#system.time(
70+
x_par <- foreach(i=unique(world$name_long),.combine=rbind) %dopar% {
71+
world2%>%
72+
filter(name_long==i)%>%
73+
st_is_within_distance(world2,dist=dist,sparse = F)
74+
}
75+
#)
76+
77+
#identical(x_seq,x_par)
78+
```
79+
80+
This approach could be used to identify which countries were 'close' to others. For example, these countries are within `r dist`m of Costa Rica:
81+
```{r}
82+
i=which(world2$name_long=="Costa Rica")
83+
# neighbor countries
84+
world2[x_par[i,],]$name_long
85+
```
86+
87+
```{r echo=F}
88+
ggplot()+
89+
geom_sf(data=world2[x_par[i,],])+
90+
geom_sf(data=world2[i,],col="red")
91+
```
92+
93+
Note that in this example the sequential version typically runs faster than the

CS_11_ParallelProcessing.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: "Parallel Computing with R"
3+
subtitle: Write a parallel for loop
4+
week: 11
5+
type: Case Study
6+
reading:
7+
- CRAN Task View [High-Performance and Parallel Computing with R](http://cran.r-project.org/web/views/HighPerformanceComputing.html)
8+
- Parallel [Computing with the R Language in a Supercomputing Environment](https://link.springer.com/chapter/10.1007/978-3-642-13872-0_64)
9+
tasks:
10+
- Write parallel for loops to speed up computation time.
11+
---
12+
13+
14+
15+
# Reading
16+
17+
- list(`CRAN [Task View` = "High-Performance and Parallel Computing with R](http://cran.r-project.org/web/views/HighPerformanceComputing.html)")
18+
- Parallel [Computing with the R Language in a Supercomputing Environment](https://link.springer.com/chapter/10.1007/978-3-642-13872-0_64)
19+
20+
21+
# Tasks
22+
23+
- Write parallel for loops to speed up computation time.
24+
25+
## Background
26+
27+
28+
```r
29+
library(tidyverse)
30+
library(spData)
31+
library(sf)
32+
33+
## New Packages
34+
library(foreach)
35+
library(doParallel)
36+
registerDoParallel()
37+
getDoParWorkers() # check registered cores
38+
```
39+
40+
```
41+
## [1] 2
42+
```
43+
44+
45+
Write an Rmd script that:
46+
47+
* Loads the `world` dataset in the `spData` package
48+
* Runs a parallel `foreach()` to loop over countries (`name_long`) and:
49+
* `filter` the world object to include only on country at a time.
50+
* use `st_is_within_distance` to find the distance from that country to all other countries in the `world` object within 100000m Set `sparse=F` to return a simple array of `T` for countries within the distance.
51+
* set `.combine=rbind` to return a simple matrix.
52+
* Confirm that you get the same answer without using foreach:
53+
* imply use `st_is_within_distance` with the transformed `world` object as both `x` and `y` object.
54+
* compare the results with `identical()`
55+
* you can also check the time difference with `system.time()`.
56+
57+
58+
59+
This approach could be used to identify which countries were 'close' to others. For example, these countries are within 10^{5}m of Costa Rica:
60+
61+
```r
62+
i=which(world2$name_long=="Costa Rica")
63+
# neighbor countries
64+
world2[x_par[i,],]$name_long
65+
```
66+
67+
```
68+
## [1] "Panama" "Costa Rica" "Nicaragua"
69+
```
70+
71+
![](CS_11_ParallelProcessing_files/figure-html/unnamed-chunk-4-1.png)<!-- -->
72+
73+
Note that in this example the sequential version typically runs faster than the
184 KB
Loading
184 KB
Loading

Schedule.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Homeworks are due at 5pm on the Friday of the week specified below.
3232
| 8 | 10/16/18 | [<i class='fas fa-desktop'> </i>](presentations/PS_08_repro.html){target='_blank'} | [Create Final Project Webpage](./TK_08.html) | [One Script, Many Products](./CS_08.html) | 6 |
3333
| 9 | 10/23/18 | [<i class='fas fa-desktop'> </i>](presentations/PS_09_weather.html){target='_blank'} | [APIs, time-series, and weather Data](./TK_09.html) | [Tracking Hurricanes!](./CS_09.html) | 7 |
3434
| 10 | 10/30/18 | [<i class='fas fa-desktop'> </i>](presentations/PS_10_RS.html){target='_blank'} | [Remote Sensing](./TK_10.html) | - | 8 |
35-
| 11 | 11/6/18 | | [Project First Draft](./TK_11.html) | - | 9 |
35+
| 11 | 11/6/18 | [<i class='fas fa-desktop'> </i>](presentations/PS_11_ParallelProcessing.html){target='_blank'} | [Project First Draft](./TK_11.html) | [Parallel Computing with R](./CS_11_ParallelProcessing.html) | 9 |
3636
| 12 | 11/13/18 | [<i class='fas fa-desktop'> </i>](presentations/PS_12.html){target='_blank'} | [Project Peer Review](./TK_12.html) | [Dynamic HTML graph of Daily Temperatures](./CS_12.html) | 10 |
3737
| 13 | 11/20/18 | | [Thanksgiving Week (Tuesday Class Optional)](./TK_13.html) | - | |
3838
| 14 | 11/27/18 | | [Final Project 2nd Draft / Building and summarizing models](./TK_14.html) | - | |

TK_11.Rmd

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
---
22
title: Project First Draft
3-
subtitle: Review project drafts from your peers
3+
subtitle: Submit the first draft of your project for peer review
44
week: 11
55
type: Task
6+
presentation: PS_11_ParallelProcessing.html
67
reading:
7-
- GitHub [Pull Requests](https://help.github.com/articles/about-pull-requests/)
8+
- Documentation for [RMarkdown Websites](https://rmarkdown.rstudio.com/rmarkdown_websites.htm)
89
tasks:
910
- Commit your first draft of your project to GitHub
1011
---
@@ -22,6 +23,12 @@ source("knitr_header.R")
2223
md_bullet(rmarkdown::metadata$reading)
2324
```
2425

26+
# Tasks
27+
28+
```{r reading,results='asis',echo=F}
29+
md_bullet(rmarkdown::metadata$tasks)
30+
```
31+
2532
### First Draft
2633

2734
The first draft of your project will be assessed by your peers in GitHub. The objectives of the peer evaluation are:
@@ -30,3 +37,18 @@ The first draft of your project will be assessed by your peers in GitHub. The ob
3037
* Provide an opportunity to share your knowledge to improve their project
3138

3239
You should use the project website template (or similar) to generate a html version of your project report. If your project requires any data not available in public repositories, you should put it in a folder called `/data` in your project's home directory and then import it into R with `read.csv('data/filname.csv')` or similar so that anyone with a copy of the repository can re-create the HTML output.
40+
41+
## Required components of first draft
42+
43+
1) **Introduction** [~ 200 words]: Clearly stated background and questions / hypotheses / problems being addressed. Sets up the analysis in an interesting and compelling way.
44+
2) **Data**: Script downloads at least one dataset automatically through the internet or loads the data from the `data/` folder. This could use a direct download (e.g. download.file()) or an API (e.g. anything from ROpenSci).
45+
3) **Figure**: The HTML file includes at least one figure of the data.
46+
2) **Reproducibility**: The .Rmd should generate the HTML output when "Build Website" is clicked.
47+
48+
### Confirming 'reproducibility'
49+
50+
After pushing the files to GitHub, try downloading it as a zip file, opening in RStudio, and clicking build website - it should work.
51+
52+
## Common issues
53+
54+
1) Importing data from somewhere on your computer. You should not have any commands such as `read.csv("~/projects/inputdata.csv")` that read any data from your computer other than the `data/` folder in your repository.

TK_11.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
---
22
title: Project First Draft
3-
subtitle: Review project drafts from your peers
3+
subtitle: Submit the first draft of your project for peer review
44
week: 11
55
type: Task
6+
presentation: PS_11_ParallelProcessing.html
67
reading:
7-
- GitHub [Pull Requests](https://help.github.com/articles/about-pull-requests/)
8+
- Documentation for [RMarkdown Websites](https://rmarkdown.rstudio.com/rmarkdown_websites.htm)
89
tasks:
910
- Commit your first draft of your project to GitHub
1011
---
@@ -17,6 +18,10 @@ tasks:
1718

1819
- GitHub [Pull Requests](https://help.github.com/articles/about-pull-requests/)
1920

21+
# Tasks
22+
23+
- Commit your first draft of your project to GitHub
24+
2025
### First Draft
2126

2227
The first draft of your project will be assessed by your peers in GitHub. The objectives of the peer evaluation are:
@@ -25,3 +30,18 @@ The first draft of your project will be assessed by your peers in GitHub. The ob
2530
* Provide an opportunity to share your knowledge to improve their project
2631

2732
You should use the project website template (or similar) to generate a html version of your project report. If your project requires any data not available in public repositories, you should put it in a folder called `/data` in your project's home directory and then import it into R with `read.csv('data/filname.csv')` or similar so that anyone with a copy of the repository can re-create the HTML output.
33+
34+
## Required components of first draft
35+
36+
1) **Introduction** [~ 200 words]: Clearly stated background and questions / hypotheses / problems being addressed. Sets up the analysis in an interesting and compelling way.
37+
2) **Data**: Script downloads at least one dataset automatically through the internet or loads the data from the `data/` folder. This could use a direct download (e.g. download.file()) or an API (e.g. anything from ROpenSci).
38+
3) **Figure**: The HTML file includes at least one figure of the data.
39+
2) **Reproducibility**: The .Rmd should generate the HTML output when "Build Website" is clicked.
40+
41+
### Confirming 'reproducibility'
42+
43+
After pushing the files to GitHub, try downloading it as a zip file, opening in RStudio, and clicking build website - it should work.
44+
45+
## Common issues
46+
47+
1) Importing data from somewhere on your computer. You should not have any commands such as `read.csv("~/projects/inputdata.csv")` that read any data from your computer other than the `data/` folder in your repository.

TK_12.Rmd

Lines changed: 35 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ reading:
88
- GitHub [Pull Requests](https://help.github.com/articles/about-pull-requests/)
99
- Chapter [28 in R4DS](http://r4ds.had.co.nz)
1010
tasks:
11-
- Review at least two other students' projects and make comments via a _pull request_ in GitHub before next class next week.
11+
- Review at least two other students' projects and make comments via a _pull request_ in GitHub.
1212
- Browse the [Leaflet website](http://rstudio.github.io/leaflet/) and take notes in your readme.md about potential uses in your project. What data could you use? How would you display it?
1313
- Browse the [HTML Widgets page](http://gallery.htmlwidgets.org/) for many more examples. Take notes in your readme.md about potential uses in your project.
1414
---
@@ -25,29 +25,47 @@ source("knitr_header.R")
2525
```{r reading,results='asis',echo=F}
2626
md_bullet(rmarkdown::metadata$reading)
2727
```
28+
2829
# Tasks
2930

30-
```{r reading,results='asis',echo=F}
31+
```{r tasks,results='asis',echo=F}
3132
md_bullet(rmarkdown::metadata$tasks)
3233
```
3334

34-
## Evaluation Instructions
35+
# Project Peer Evaluation
36+
37+
## Instructions
38+
39+
Select two repositories and evaluate them according to the instructions listed in the [Project First Draft task](TK_11.html)
40+
41+
![](project_assets/project_evaluation.png)
3542

36-
Select two repositories and evaluate them according to the instructions and rubric below.
43+
### Download and reproduce the project
3744

38-
1) Explore the final projects in the [class repositor](https://github.com/AdamWilsonLabEDU)
39-
2) Open the repository and check if there have already been two reviews by checking if there are 2 (or more) "Pull Requests". For example, in the image below, there are 0 pull requests, so this repository would be available for you to review. If there are already 2 pull requests, select another repository. ![](assets/pull_reqeust.png)
40-
2) Go to the github page linked in the assignment and download the repository as a zip file (click on the <img src='assets/download.png' width=100> button).
45+
1) Explore the final projects in the [class repository](https://github.com/AdamWilsonLabEDU?q=finalproject)
46+
2) Select two projects that do not already have two evaluations (pull requests). For example, in the image above, there are 0 pull requests, so this repository would be available for you to review. If there are already 2 pull requests, select another repository.
47+
2) Go to the github page linked in the assignment and download the repository as a zip file (click on the <img src='project_assets/download.png' width=100> button).
4148
3) Unzip the file after it downloads
42-
4) Open the project or `index.Rmd` in RStudio and click `knit` or `Build Website` in the `Build` tab in the upper right.
43-
44-
Evaluate the following provide any feedback via pull request.
45-
1) Website
46-
1) **Introduction** [~ 200 words]: Clearly stated background and questions / hypotheses / problems being addressed. Sets up the analysis in an interesting and compelling way.
47-
2) **Data**: Script downloads at least one dataset automatically through the internet. This could use a direct download (e.g. download.file()) or an API (anything from ROpenSci).
48-
3) **Figure**: The HTML file includes at least one figure of the data.
49-
2) **Output:** The .Rmd produces HTML output with
50-
1) section headers for all the major sections of the paper
51-
2) a draft of the complete introduction.
49+
4) Open the project or `index.Rmd` in RStudio and click `Build Website` in the `Build` tab in the upper right.
50+
5) Evaluate whether the project meets the specifications listed in the [Project First Draft task](TK_11.html)
51+
52+
53+
### Provide feedback and evaluation via pull request
54+
55+
After you reproduce the project, you will provide feedback via pull request.
56+
57+
The following video will walk you through the steps of providing feedback via a pull request.
58+
<iframe width="560" height="315" src="https://www.youtube.com/embed/wy9EggBhC-M" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
59+
60+
1) In the "Code" tab of the github page for the project, click on the file you want to provide feedback on (typically this will be `index.Rmd`)
61+
2) Click the pencil icon on the right side to edit the file
62+
3) You can make changes or comment on the code
63+
* To make changes, simply edit the text
64+
* To comment, you must still make some sort of change on the lines where you want to cmment. The easiest thing is simply to add a space at the end of the line (as I do in the video above).
65+
4) At the bottom of the file, there is a section called "Commit Changes", select the button for **Create a new branch for this commit and start a pull request.** and name the new branch `project_feedback_githubusername`
66+
5) Click "Propose File change"
67+
6) Click on the button "Files Changed #1" near the middle of the next page
68+
7) Hover over lines you would like to comment on and click the little blue plus button. Then enter your comment and select "Add single comment"
69+
6) Repeat steps 2-6 for any additional files you want to comment on
5270

5371
Be sure to install any required libraries (do not complain if it fails because you don't have a library installed).

0 commit comments

Comments
 (0)