From 4afcac9b7e76451095c3247504907077eb1ca8a5 Mon Sep 17 00:00:00 2001 From: JorisSchut Date: Sun, 22 Mar 2015 18:19:01 +0100 Subject: [PATCH 01/82] codebook template Codebook template that can be used in the Getting and Cleaning data project. --- Codebook template.Rmd | 55 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 Codebook template.Rmd diff --git a/Codebook template.Rmd b/Codebook template.Rmd new file mode 100644 index 00000000..07eef717 --- /dev/null +++ b/Codebook template.Rmd @@ -0,0 +1,55 @@ +--- +title: "Codebook template" +author: "Joris Schut" +date: "Tuesday, March 05, 2015" +output: + html_document: + keep_md: yes +--- + +## Project Description +Short description of the project + +##Study design and data processing + +###Collection of the raw data +Description of how the data was collected. + +###Notes on the original (raw) data +Some additional notes (if avaialble, otherwise you can leave this section out). + +##Creating the tidy datafile + +###Guide to create the tidy data file +Description on how to create the tidy data file (1. download the data, ...)/ + +###Cleaning of the data +Short, high-level description of what the cleaning script does. [link to the readme document that describes the code in greater detail]() + +##Description of the variables in the tiny_data.txt file +General description of the file including: + - Dimensions of the dataset + - Summary of the data + - Variables present in the dataset + +(you can easily use Rcode for this, just load the dataset and provide the information directly form the tidy data file) + +###Variable 1 (repeat this section for all variables in the dataset) +Short description of what the variable describes. + +Some information on the variable including: + - Class of the variable + - Unique values/levels of the variable + - Unit of measurement (if no unit of measurement list this as well) + - In case names follow some schema, describe how entries were constructed (for example time-body-gyroscope-z has 4 levels of descriptors. Describe these 4 levels). + +(you can easily use Rcode for this, just load the dataset and provide the information directly form the tidy data file) + +####Notes on variable 1: +If available, some additional notes on the variable not covered elsewehere. If no notes are present leave this section out. + +##Sources +Sources you used if any, otherise leave out. + +##Annex +If you used any code in the codebook that had the echo=FALSE attribute post this here (make sure you set the results parameter to 'hide' as you do not want the results to show again) \ No newline at end of file From 31662eb418a39dfb96c9c92680f290ba8c85f4a2 Mon Sep 17 00:00:00 2001 From: seankross Date: Tue, 24 Mar 2015 15:50:39 -0400 Subject: [PATCH 02/82] added births gist --- statinf-exp-distro | 34 ---------------------------------- statinf.md | 1 + 2 files changed, 1 insertion(+), 34 deletions(-) delete mode 100644 statinf-exp-distro diff --git a/statinf-exp-distro b/statinf-exp-distro deleted file mode 100644 index e46f465e..00000000 --- a/statinf-exp-distro +++ /dev/null @@ -1,34 +0,0 @@ -# statinf-exp-distro - -While I was searching for real examples of exponential distribution, I came across this interesting piece. - -"To see an example of a distribution that is approximately exponential, we will look at the interarrival time of babies. -On December 18, 1997, 44 babies were born in a hospital in Brisbane, Australia. The times of birth for all 44 babies were -reported in the local paper; you can download the data from http://thinkstats.com/babyboom.dat. " - -Later, I came across the Centers for Disease Control and Prevention web site that has data (~200 MB !) on births in the US. -The User Guide PDF document has the columns for date and time of birth. I want to use this data in a way similar to course -project of 'Statistical Inference' course. The project description as below: - - In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. - The exponential distribution can be simulated in R with rexp(n, λ ) where λ is the rate parameter. - The mean of exponential distribution is 1/λ and the standard deviation is also 1/λ . Set λ = 0.2 for all - of the simulations. You will investigate the distribution of averages of 40 exponentials. - - Note that you will need to do a thousand simulations. - - Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. - You should - 1. Show the sample mean and compare it to the theoretical mean of the distribution. - 2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. - 3. Show that the distribution is approximately normal. - -Thus, with the births dataset, to prove the convergence, do the following steps sound valid? - - Establish that the data has an exponential distribution. (How do I find the λ value?) - Take a random sample (size = 10) of the birth inter-arrival time. Calculate its mean. - Repeat this many times (n=1000). Plot it. - Repeat for size=50, size=100, size=250 and size=500. - - Then, the plots (for sizes 50, 100 and 500) should converge around normal distribution. - diff --git a/statinf.md b/statinf.md index 821916dc..e9500c8b 100644 --- a/statinf.md +++ b/statinf.md @@ -5,3 +5,4 @@ permalink: /statinf/ --- - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) +- [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) \ No newline at end of file From a205ec7ec54397eecefc58a343f754f5724ff9b2 Mon Sep 17 00:00:00 2001 From: seankross Date: Tue, 24 Mar 2015 15:56:05 -0400 Subject: [PATCH 03/82] added codebook gist --- Codebook template.Rmd | 55 ------------------------------------------- getclean.md | 1 + 2 files changed, 1 insertion(+), 55 deletions(-) delete mode 100644 Codebook template.Rmd diff --git a/Codebook template.Rmd b/Codebook template.Rmd deleted file mode 100644 index 07eef717..00000000 --- a/Codebook template.Rmd +++ /dev/null @@ -1,55 +0,0 @@ ---- -title: "Codebook template" -author: "Joris Schut" -date: "Tuesday, March 05, 2015" -output: - html_document: - keep_md: yes ---- - -## Project Description -Short description of the project - -##Study design and data processing - -###Collection of the raw data -Description of how the data was collected. - -###Notes on the original (raw) data -Some additional notes (if avaialble, otherwise you can leave this section out). - -##Creating the tidy datafile - -###Guide to create the tidy data file -Description on how to create the tidy data file (1. download the data, ...)/ - -###Cleaning of the data -Short, high-level description of what the cleaning script does. [link to the readme document that describes the code in greater detail]() - -##Description of the variables in the tiny_data.txt file -General description of the file including: - - Dimensions of the dataset - - Summary of the data - - Variables present in the dataset - -(you can easily use Rcode for this, just load the dataset and provide the information directly form the tidy data file) - -###Variable 1 (repeat this section for all variables in the dataset) -Short description of what the variable describes. - -Some information on the variable including: - - Class of the variable - - Unique values/levels of the variable - - Unit of measurement (if no unit of measurement list this as well) - - In case names follow some schema, describe how entries were constructed (for example time-body-gyroscope-z has 4 levels of descriptors. Describe these 4 levels). - -(you can easily use Rcode for this, just load the dataset and provide the information directly form the tidy data file) - -####Notes on variable 1: -If available, some additional notes on the variable not covered elsewehere. If no notes are present leave this section out. - -##Sources -Sources you used if any, otherise leave out. - -##Annex -If you used any code in the codebook that had the echo=FALSE attribute post this here (make sure you set the results parameter to 'hide' as you do not want the results to show again) \ No newline at end of file diff --git a/getclean.md b/getclean.md index 9afb3fcd..7e30d4ef 100644 --- a/getclean.md +++ b/getclean.md @@ -13,3 +13,4 @@ permalink: /getclean/ - [Second Codebook sample](https://gist.github.com/kirstenfrank/699abe3e16fd1dc36e5d) - [Query string (and other fields-within-fields) unrolling](http://rpubs.com/schnee/32988) - [Pre-processing Excel files before loading them into R](https://github.com/alkashef/cleaningexceldata) +- [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41) From ec64fabe01085cae3d4f2805d81f6e9d145c81e5 Mon Sep 17 00:00:00 2001 From: Xing Su Date: Tue, 24 Mar 2015 13:08:32 -0700 Subject: [PATCH 04/82] Added links to the notes I compiled for all 9 classes --- about.md | 3 ++- ddp.md | 4 ++++ eda.md | 6 +++++- getclean.md | 4 ++++ pml.md | 4 ++++ regmod.md | 4 ++++ repres.md | 4 ++++ rprog.md | 4 ++++ statinf.md | 4 ++++ toolbox.md | 3 +++ 10 files changed, 38 insertions(+), 2 deletions(-) diff --git a/about.md b/about.md index 1685c410..27bf2391 100644 --- a/about.md +++ b/about.md @@ -22,4 +22,5 @@ The [Data Science Specialization](https://www.coursera.org/specialization/jhudat - Michael Sachs - Allan Inocêncio de Souza Costa - [stepds](https://github.com/stepds) -- Bastiaan Quast \ No newline at end of file +- Bastiaan Quast +- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file diff --git a/ddp.md b/ddp.md index b3d009e2..a517b609 100644 --- a/ddp.md +++ b/ddp.md @@ -8,3 +8,7 @@ permalink: /ddp/ - [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides/) - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) + +## Comprehensive Notes + +- Complete notes for [Developing Data Products](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/eda.md b/eda.md index af87ef61..7bcf339c 100644 --- a/eda.md +++ b/eda.md @@ -6,4 +6,8 @@ permalink: /eda/ - [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph) - [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) -- [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) \ No newline at end of file +- [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) + +## Comprehensive Notes + +- Complete notes for [Exploratory Data Analysis](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/getclean.md b/getclean.md index 9afb3fcd..a4da5515 100644 --- a/getclean.md +++ b/getclean.md @@ -13,3 +13,7 @@ permalink: /getclean/ - [Second Codebook sample](https://gist.github.com/kirstenfrank/699abe3e16fd1dc36e5d) - [Query string (and other fields-within-fields) unrolling](http://rpubs.com/schnee/32988) - [Pre-processing Excel files before loading them into R](https://github.com/alkashef/cleaningexceldata) + +## Comprehensive Notes + +- Complete notes for [Getting and Cleaning Data](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/pml.md b/pml.md index f6905302..81464e9c 100644 --- a/pml.md +++ b/pml.md @@ -20,3 +20,7 @@ permalink: /pml/ ## Choosing a Machine Learning Model - [Comparing Supervised Learning Algorithms](http://www.dataschool.io/comparing-supervised-learning-algorithms/): Comparing 8 common supervised learning algorithms (for regression and classification) on 13 different dimensions. + +## Comprehensive Notes + +- Complete notes for [Practical Machine Learning](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/regmod.md b/regmod.md index a18c17c7..1445c83d 100644 --- a/regmod.md +++ b/regmod.md @@ -7,3 +7,7 @@ permalink: /regmod/ ## Supplementary Videos - [Video lectures from "An Introduction to Statistical Learning"](http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/): Videos for Chapter 3 can help to deepen your understanding of regression. + +## Comprehensive Notes + +- Complete notes for [Regression Models](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/repres.md b/repres.md index ead7eac4..41303ddd 100644 --- a/repres.md +++ b/repres.md @@ -7,3 +7,7 @@ permalink: /repres/ - [Turning a RPubs document into a Github website walkthrough](https://github.com/thoughtfulbloke/appleorange) - [Introduction to knitr with rmarkdown](https://sachsmc.github.io/knit-git-markr-guide/knitr/knit.html) - [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) + +## Comprehensive Notes + +- Complete notes for [Reproducible Research](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file diff --git a/rprog.md b/rprog.md index 082f8ff5..398aeb3e 100644 --- a/rprog.md +++ b/rprog.md @@ -16,5 +16,9 @@ permalink: /rprog/ - [Some notes on the R Language](http://lopezrj.github.io) ## R language cheatsheet + - [R cheatsheet covering all lectures](https://github.com/startupjing/Tech_Notes/blob/master/R/R_language.md) +## Comprehensive Notes + +- Complete notes for [R Programming](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file diff --git a/statinf.md b/statinf.md index 821916dc..5869683a 100644 --- a/statinf.md +++ b/statinf.md @@ -5,3 +5,7 @@ permalink: /statinf/ --- - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) + +## Comprehensive Notes + +- Complete notes for [Statistical Inference](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file diff --git a/toolbox.md b/toolbox.md index 2f9a8134..347c8c98 100644 --- a/toolbox.md +++ b/toolbox.md @@ -13,3 +13,6 @@ permalink: /toolbox/ - [Understanding the Relationship Between Git and GitHub](http://www.dataschool.io/github-is-just-dropbox-for-git/) - [Simple Guide to GitHub Forks](http://www.dataschool.io/simple-guide-to-forks-in-github-and-git/) - [Github Repo Tutorial How to fork a repo, download it to your local drive and commit changes ](https://www.youtube.com/watch?v=MY94AIplcaU) + +## Comprehensive Notes +- Complete notes for [The Data Scientist's Toolbox](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file From dab179d6e69ceda5e189ef68d68ee1c1dc4b210c Mon Sep 17 00:00:00 2001 From: Dan Killian Date: Wed, 8 Apr 2015 19:09:41 -0400 Subject: [PATCH 05/82] Upload benefit-cost analysis exercise in knitr --- Benefit-cost analysis of park user fee.html | 331 ++++++++++++++++++++ 1 file changed, 331 insertions(+) create mode 100644 Benefit-cost analysis of park user fee.html diff --git a/Benefit-cost analysis of park user fee.html b/Benefit-cost analysis of park user fee.html new file mode 100644 index 00000000..3f029100 --- /dev/null +++ b/Benefit-cost analysis of park user fee.html @@ -0,0 +1,331 @@ + + + + + + + + + + + + + +Cost-Benefit Analysis Lesson 6 Problem + + + + + + + + + + + + + + + + + + + + + +
+ + + + + +

Marginal Benefit (MB) = 200 - Q
Travel Cost (TC) = 20
Congestion Costs (CC) = Q - 100
Marginal Social Cost (MSC) = TC + CC

+
MB <- function(q) {
+    200 - q
+    }
+TC <- 20
+CC <- function(q) {
+    q - 80
+    }
+MSC = function(TC, CC) {
+    TC + CC
+    }
+

How many visits would we expect if there were no entry fee for the park?

+

If we were only dealing with trip cost, then:

+

200 - q = 20
q* = 180

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+points(180,20,pch=16)
+

+
dev.off()
+
## null device 
+##           1
+
    +
  1. What is the consumer surplus associated with these visits?
  2. +
+

Consumer surplus is the area above optimal price and left of the demand schedule.

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+points(180,20,pch=16)
+xcord <- c(0, 180, 0)
+ycord <- c(20,20, 200)
+polygon(xcord,ycord,col="gray")
+text(60, 70, "Consumer surplus (no entrance fee)")
+

+
dev.off()
+
## null device 
+##           1
+

For linear demand, consumer surplus is 1/2 * x * y

+
1/2 * 180 * 180
+
## [1] 16200
+

$16,200

+

If we were dealing with a nonlinear demand function, we’d need to integrate over q.

+
require(mosaic)
+
## Loading required package: mosaic
+## Loading required package: car
+## Loading required package: dplyr
+## 
+## Attaching package: 'dplyr'
+## 
+## The following object is masked from 'package:stats':
+## 
+##     filter
+## 
+## The following objects are masked from 'package:base':
+## 
+##     intersect, setdiff, setequal, union
+## 
+## Loading required package: lattice
+## Loading required package: ggplot2
+## 
+## Attaching package: 'mosaic'
+## 
+## The following objects are masked from 'package:dplyr':
+## 
+##     count, do, tally
+## 
+## The following object is masked from 'package:car':
+## 
+##     logit
+## 
+## The following objects are masked from 'package:stats':
+## 
+##     binom.test, cor, cov, D, fivenum, IQR, median, prop.test,
+##     quantile, sd, t.test, var
+## 
+## The following objects are masked from 'package:base':
+## 
+##     max, mean, min, prod, range, sample, sum
+
F <- antiD(200 - q ~ q)
+F
+
## function (q, C = 0) 
+## 200 * q - 1/2 * q^2 + C
+
F(200) - F(20)
+
## [1] 16200
+

Same result.

+
    +
  1. What is an efficient fee for the park?
  2. +
+

If we are dealing with total social costs, then:

+

Congestion costs + trip cost = q - 80

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+segments(80,0,200,120)
+

+
dev.off()
+
## null device 
+##           1
+

Set marginal benefit equal to marginal costs to solve for optimal values

+

200 - q = q - 80
2q = 280
q* = 140
MB* = 60

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+segments(80,0,200,120)
+points(140,60, pch=16)
+segments(140,0,140,60, lty="dotted")
+segments(0,60,140,60, lty="dotted")
+

+
dev.off()
+
## null device 
+##           1
+

$60 is the optimal cost, but $20 of that is the trip cost, so the fee imposed by the facility would be $40.

+
    +
  1. How many visits would we see with the efficient fee?
  2. +
+

140

+
    +
  1. After a fee is imposed: +
      +
    1. What is the consumer surplus?
    2. +
  2. +
+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+segments(80,0,200,120)
+points(140,60, pch=16)
+segments(140,0,140,60, lty="dotted")
+segments(0,60,140,60, lty="dotted")
+xcord <- c(0,140,0)
+ycord <- c(60, 60, 200)
+polygon(xcord,ycord,col="gray")
+text(45,100, "Consumer surplus (with fee)")
+

+
dev.off()
+
## null device 
+##           1
+
1/2*140*140
+
## [1] 9800
+
F(200) - F(60)
+
## [1] 9800
+

$9,800

+
b.  What is the government revenue?
+

Government revenue is the portion of the original consumer surplus that now goes toward the fee.

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+points(140,60, pch=16)
+segments(140,0,140,60, lty="dotted")
+segments(0,60,140,60, lty="dotted")
+xcord <- c(0,140,0)
+ycord <- c(60, 60, 200)
+polygon(xcord,ycord,col="gray")
+text(40, 100, "Consumer surplus")
+xcord2 <- c(0,140,140,0)
+ycord2 <- c(60,60,20,20)
+polygon(xcord2, ycord2, col="light blue")
+segments(80,0,200,120)
+text(50, 40, "Government revenue")
+

+
dev.off()
+
## null device 
+##           1
+
40*140
+
## [1] 5600
+

Price $40 * visitors 140 = $5,600

+
c.  What is the deadweight loss that has been avoided?
+

The deadweight loss is the lost efficiency due to imposition of the fee. Graphically, it is the triangle below and left of the demand curve, to the right of the supply (variable costs) curve, and above the fixed cost line.

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+points(140,60, pch=16)
+segments(140,0,140,60, lty="dotted")
+segments(0,60,140,60, lty="dotted")
+xcord <- c(0,140,0)
+ycord <- c(60, 60, 200)
+polygon(xcord,ycord,col="gray")
+text(40, 100, "Consumer surplus")
+xcord2 <- c(0,140,140,0)
+ycord2 <- c(60,60,20,20)
+polygon(xcord2, ycord2, col="light blue")
+segments(80,0,200,120)
+text(50, 40, "Government revenue")
+segments(180,20,180,100, lty="dotted")
+xcord3 <- c(180,180,140) 
+ycord3 <- c(20,100,60)
+polygon(xcord3, ycord3, col="darkslategray4")
+text(160, 60, "Loss")
+

+
dev.off()
+
## null device 
+##           1
+
1/2*(180-140)*(100-20)
+
## [1] 1600
+
F(180) - F(140)
+
## [1] 1600
+

$1,600

+

In the absence of a fee, visitors in the “Loss” area do not enjoy any surplus (benefit) due to congestion.

+
d.    What are the avoided congestion costs?
+

Congestion costs would be visits after the optimal point of 140, up to the trip cost optimum of 180, bounded by the optimal prices of $60 and $20.

+
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
+abline(h=20, lty="dashed")
+points(140,60, pch=16)
+segments(140,0,140,60, lty="dotted")
+segments(0,60,140,60, lty="dotted")
+xcord <- c(0,140,0)
+ycord <- c(60, 60, 200)
+polygon(xcord,ycord,col="gray")
+text(40, 100, "Consumer surplus")
+xcord2 <- c(0,140,140,0)
+ycord2 <- c(60,60,20,20)
+polygon(xcord2, ycord2, col="light blue")
+segments(80,0,200,120)
+text(50, 40, "Government revenue")
+segments(180,20,180,100, lty="dotted")
+xcord3 <- c(180,180,140) 
+ycord3 <- c(20,100,60)
+polygon(xcord3, ycord3, col="darkslategray4")
+text(160, 60, "Loss")
+xcord4 <- c(140,180,140) 
+ycord4 <- c(60,20,20)
+polygon(xcord4, ycord4, col="mediumorchid")
+text(151, 30, "Costs")
+

+
dev.off()
+
## null device 
+##           1
+
1/2*(180-140)*(60-20)
+
## [1] 800
+

$800 in congestion costs from $140-$180 below the demand curve, plus $1,600 from the deadweight loss ($140-$180 above the demand curve). So $2,400 in congestion costs are avoided.

+
    +
  1. What are the net benefits from imposing a fee?
  2. +
+

Without fee:

+

Benefit: Consumer surplus $16,200

+

Cost: Congestion $3,200

+

Net benefit: $13,000

+

With fee:

+

Benefit: Consumer surplus $9,800
Government revenue $5,600

+

Costs: Congestion $800

+

Net benefit $14,600

+

The net benefit of imposing a fee is $14,600 - $13,000 = $1,600

+

This is the same as the deadweight loss that was avoided by imposing a fee

+ + +
+ + + + + + + + From 35e0c36929f83a93759094ad2b7c0e5122835646 Mon Sep 17 00:00:00 2001 From: Kevin Markham Date: Wed, 8 Apr 2015 21:14:56 -0400 Subject: [PATCH 06/82] add link to machine learning video on PML page --- pml.md | 1 + 1 file changed, 1 insertion(+) diff --git a/pml.md b/pml.md index 81464e9c..be4defc3 100644 --- a/pml.md +++ b/pml.md @@ -11,6 +11,7 @@ permalink: /pml/ ## Supplementary Videos +- [What is machine learning, and how does it work?](https://www.youtube.com/watch?v=elojMnjn4kk): A high-level overview of machine learning in a 10-minute video - [Video lectures from "An Introduction to Statistical Learning"](http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/): Videos for Chapters 4, 5, 6, 8, and 10 can help to deepen your understanding of the topics presented in this course. ## Machine Learning Competitions From 601d05ca0eb9247e3f87b43853ccb88c903597d8 Mon Sep 17 00:00:00 2001 From: dkillian Date: Thu, 9 Apr 2015 15:58:38 -0400 Subject: [PATCH 07/82] Update repres.md --- repres.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/repres.md b/repres.md index 41303ddd..1ce2f7fa 100644 --- a/repres.md +++ b/repres.md @@ -7,7 +7,8 @@ permalink: /repres/ - [Turning a RPubs document into a Github website walkthrough](https://github.com/thoughtfulbloke/appleorange) - [Introduction to knitr with rmarkdown](https://sachsmc.github.io/knit-git-markr-guide/knitr/knit.html) - [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) +- [Benefit-cost analysis of a park user fee](https://github.com/dkillian/dkillian.github.io) ## Comprehensive Notes -- Complete notes for [Reproducible Research](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file +- Complete notes for [Reproducible Research](http://sux13.github.io/DataScienceSpCourseNotes/) From a15e4760b453908083d35843f71af112fdb2e909 Mon Sep 17 00:00:00 2001 From: dkillian Date: Thu, 9 Apr 2015 16:00:49 -0400 Subject: [PATCH 08/82] Delete Benefit-cost analysis of park user fee.html --- Benefit-cost analysis of park user fee.html | 331 -------------------- 1 file changed, 331 deletions(-) delete mode 100644 Benefit-cost analysis of park user fee.html diff --git a/Benefit-cost analysis of park user fee.html b/Benefit-cost analysis of park user fee.html deleted file mode 100644 index 3f029100..00000000 --- a/Benefit-cost analysis of park user fee.html +++ /dev/null @@ -1,331 +0,0 @@ - - - - - - - - - - - - - -Cost-Benefit Analysis Lesson 6 Problem - - - - - - - - - - - - - - - - - - - - - -
- - - - - -

Marginal Benefit (MB) = 200 - Q
Travel Cost (TC) = 20
Congestion Costs (CC) = Q - 100
Marginal Social Cost (MSC) = TC + CC

-
MB <- function(q) {
-    200 - q
-    }
-TC <- 20
-CC <- function(q) {
-    q - 80
-    }
-MSC = function(TC, CC) {
-    TC + CC
-    }
-

How many visits would we expect if there were no entry fee for the park?

-

If we were only dealing with trip cost, then:

-

200 - q = 20
q* = 180

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-points(180,20,pch=16)
-

-
dev.off()
-
## null device 
-##           1
-
    -
  1. What is the consumer surplus associated with these visits?
  2. -
-

Consumer surplus is the area above optimal price and left of the demand schedule.

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-points(180,20,pch=16)
-xcord <- c(0, 180, 0)
-ycord <- c(20,20, 200)
-polygon(xcord,ycord,col="gray")
-text(60, 70, "Consumer surplus (no entrance fee)")
-

-
dev.off()
-
## null device 
-##           1
-

For linear demand, consumer surplus is 1/2 * x * y

-
1/2 * 180 * 180
-
## [1] 16200
-

$16,200

-

If we were dealing with a nonlinear demand function, we’d need to integrate over q.

-
require(mosaic)
-
## Loading required package: mosaic
-## Loading required package: car
-## Loading required package: dplyr
-## 
-## Attaching package: 'dplyr'
-## 
-## The following object is masked from 'package:stats':
-## 
-##     filter
-## 
-## The following objects are masked from 'package:base':
-## 
-##     intersect, setdiff, setequal, union
-## 
-## Loading required package: lattice
-## Loading required package: ggplot2
-## 
-## Attaching package: 'mosaic'
-## 
-## The following objects are masked from 'package:dplyr':
-## 
-##     count, do, tally
-## 
-## The following object is masked from 'package:car':
-## 
-##     logit
-## 
-## The following objects are masked from 'package:stats':
-## 
-##     binom.test, cor, cov, D, fivenum, IQR, median, prop.test,
-##     quantile, sd, t.test, var
-## 
-## The following objects are masked from 'package:base':
-## 
-##     max, mean, min, prod, range, sample, sum
-
F <- antiD(200 - q ~ q)
-F
-
## function (q, C = 0) 
-## 200 * q - 1/2 * q^2 + C
-
F(200) - F(20)
-
## [1] 16200
-

Same result.

-
    -
  1. What is an efficient fee for the park?
  2. -
-

If we are dealing with total social costs, then:

-

Congestion costs + trip cost = q - 80

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-segments(80,0,200,120)
-

-
dev.off()
-
## null device 
-##           1
-

Set marginal benefit equal to marginal costs to solve for optimal values

-

200 - q = q - 80
2q = 280
q* = 140
MB* = 60

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-segments(80,0,200,120)
-points(140,60, pch=16)
-segments(140,0,140,60, lty="dotted")
-segments(0,60,140,60, lty="dotted")
-

-
dev.off()
-
## null device 
-##           1
-

$60 is the optimal cost, but $20 of that is the trip cost, so the fee imposed by the facility would be $40.

-
    -
  1. How many visits would we see with the efficient fee?
  2. -
-

140

-
    -
  1. After a fee is imposed: -
      -
    1. What is the consumer surplus?
    2. -
  2. -
-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-segments(80,0,200,120)
-points(140,60, pch=16)
-segments(140,0,140,60, lty="dotted")
-segments(0,60,140,60, lty="dotted")
-xcord <- c(0,140,0)
-ycord <- c(60, 60, 200)
-polygon(xcord,ycord,col="gray")
-text(45,100, "Consumer surplus (with fee)")
-

-
dev.off()
-
## null device 
-##           1
-
1/2*140*140
-
## [1] 9800
-
F(200) - F(60)
-
## [1] 9800
-

$9,800

-
b.  What is the government revenue?
-

Government revenue is the portion of the original consumer surplus that now goes toward the fee.

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-points(140,60, pch=16)
-segments(140,0,140,60, lty="dotted")
-segments(0,60,140,60, lty="dotted")
-xcord <- c(0,140,0)
-ycord <- c(60, 60, 200)
-polygon(xcord,ycord,col="gray")
-text(40, 100, "Consumer surplus")
-xcord2 <- c(0,140,140,0)
-ycord2 <- c(60,60,20,20)
-polygon(xcord2, ycord2, col="light blue")
-segments(80,0,200,120)
-text(50, 40, "Government revenue")
-

-
dev.off()
-
## null device 
-##           1
-
40*140
-
## [1] 5600
-

Price $40 * visitors 140 = $5,600

-
c.  What is the deadweight loss that has been avoided?
-

The deadweight loss is the lost efficiency due to imposition of the fee. Graphically, it is the triangle below and left of the demand curve, to the right of the supply (variable costs) curve, and above the fixed cost line.

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-points(140,60, pch=16)
-segments(140,0,140,60, lty="dotted")
-segments(0,60,140,60, lty="dotted")
-xcord <- c(0,140,0)
-ycord <- c(60, 60, 200)
-polygon(xcord,ycord,col="gray")
-text(40, 100, "Consumer surplus")
-xcord2 <- c(0,140,140,0)
-ycord2 <- c(60,60,20,20)
-polygon(xcord2, ycord2, col="light blue")
-segments(80,0,200,120)
-text(50, 40, "Government revenue")
-segments(180,20,180,100, lty="dotted")
-xcord3 <- c(180,180,140) 
-ycord3 <- c(20,100,60)
-polygon(xcord3, ycord3, col="darkslategray4")
-text(160, 60, "Loss")
-

-
dev.off()
-
## null device 
-##           1
-
1/2*(180-140)*(100-20)
-
## [1] 1600
-
F(180) - F(140)
-
## [1] 1600
-

$1,600

-

In the absence of a fee, visitors in the “Loss” area do not enjoy any surplus (benefit) due to congestion.

-
d.    What are the avoided congestion costs?
-

Congestion costs would be visits after the optimal point of 140, up to the trip cost optimum of 180, bounded by the optimal prices of $60 and $20.

-
plot(MB, 0, 200, ylim=c(4,200), xlim=c(8,200), xlab=("Trips to the park"), ylab="Benefit ($ cost)")
-abline(h=20, lty="dashed")
-points(140,60, pch=16)
-segments(140,0,140,60, lty="dotted")
-segments(0,60,140,60, lty="dotted")
-xcord <- c(0,140,0)
-ycord <- c(60, 60, 200)
-polygon(xcord,ycord,col="gray")
-text(40, 100, "Consumer surplus")
-xcord2 <- c(0,140,140,0)
-ycord2 <- c(60,60,20,20)
-polygon(xcord2, ycord2, col="light blue")
-segments(80,0,200,120)
-text(50, 40, "Government revenue")
-segments(180,20,180,100, lty="dotted")
-xcord3 <- c(180,180,140) 
-ycord3 <- c(20,100,60)
-polygon(xcord3, ycord3, col="darkslategray4")
-text(160, 60, "Loss")
-xcord4 <- c(140,180,140) 
-ycord4 <- c(60,20,20)
-polygon(xcord4, ycord4, col="mediumorchid")
-text(151, 30, "Costs")
-

-
dev.off()
-
## null device 
-##           1
-
1/2*(180-140)*(60-20)
-
## [1] 800
-

$800 in congestion costs from $140-$180 below the demand curve, plus $1,600 from the deadweight loss ($140-$180 above the demand curve). So $2,400 in congestion costs are avoided.

-
    -
  1. What are the net benefits from imposing a fee?
  2. -
-

Without fee:

-

Benefit: Consumer surplus $16,200

-

Cost: Congestion $3,200

-

Net benefit: $13,000

-

With fee:

-

Benefit: Consumer surplus $9,800
Government revenue $5,600

-

Costs: Congestion $800

-

Net benefit $14,600

-

The net benefit of imposing a fee is $14,600 - $13,000 = $1,600

-

This is the same as the deadweight loss that was avoided by imposing a fee

- - -
- - - - - - - - From f0aaa43613806cc0662e0794c796a04e6e6f677a Mon Sep 17 00:00:00 2001 From: dkillian Date: Sat, 11 Apr 2015 02:21:05 -0400 Subject: [PATCH 09/82] Update repres.md --- repres.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/repres.md b/repres.md index 1ce2f7fa..20e340dd 100644 --- a/repres.md +++ b/repres.md @@ -7,7 +7,7 @@ permalink: /repres/ - [Turning a RPubs document into a Github website walkthrough](https://github.com/thoughtfulbloke/appleorange) - [Introduction to knitr with rmarkdown](https://sachsmc.github.io/knit-git-markr-guide/knitr/knit.html) - [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) -- [Benefit-cost analysis of a park user fee](https://github.com/dkillian/dkillian.github.io) +- [Benefit-cost analysis of a park user fee](https://rstudio-pubs-static.s3.amazonaws.com/72135_dc45211d976842c2a9a8c8b5f2472ff0.html) ## Comprehensive Notes From d8db89117a7787ebbbe0675c026dfb830616b9e6 Mon Sep 17 00:00:00 2001 From: Aratinga Date: Mon, 13 Apr 2015 09:39:02 -0400 Subject: [PATCH 10/82] Update curated.md Temporary location for a file referenced in Toolbox. Ron Meir has given permission. --- curated.md | 1 + 1 file changed, 1 insertion(+) diff --git a/curated.md b/curated.md index 5bad81ff..017d64f3 100644 --- a/curated.md +++ b/curated.md @@ -34,6 +34,7 @@ permalink: /curated/ ### Reproducible Research - [Markdown live demo](http://markdown-here.com/livedemo.html) +- [Boosting Slides by Ron Meir] (https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) ### Textbooks - [OpenIntro textbook](https://www.openintro.org/stat/textbook.php) From d29df74c7977da2474798633aeda4ef9bb37b8f4 Mon Sep 17 00:00:00 2001 From: seankross Date: Mon, 13 Apr 2015 13:07:23 -0400 Subject: [PATCH 11/82] fixed link --- curated.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/curated.md b/curated.md index 017d64f3..5c500b8e 100644 --- a/curated.md +++ b/curated.md @@ -34,7 +34,7 @@ permalink: /curated/ ### Reproducible Research - [Markdown live demo](http://markdown-here.com/livedemo.html) -- [Boosting Slides by Ron Meir] (https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) +- [Boosting Slides by Ron Meir](https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) ### Textbooks - [OpenIntro textbook](https://www.openintro.org/stat/textbook.php) From 0437d0ecafec49cada4c1d392b99e75ff3ed60c0 Mon Sep 17 00:00:00 2001 From: Xiaoning Wang Date: Thu, 23 Apr 2015 15:37:55 -0400 Subject: [PATCH 12/82] Add link to Probability and Statistics Cookbook --- statinf.md | 1 + 1 file changed, 1 insertion(+) diff --git a/statinf.md b/statinf.md index c5a85435..cb79fbb6 100644 --- a/statinf.md +++ b/statinf.md @@ -6,6 +6,7 @@ permalink: /statinf/ - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) +- [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/) ## Comprehensive Notes From eee4951f32f6d501782cddab899ebd8d88e6c3cd Mon Sep 17 00:00:00 2001 From: seankross Date: Fri, 24 Apr 2015 10:14:20 -0400 Subject: [PATCH 13/82] modified statinf --- curated.md | 3 +++ statinf.md | 1 - 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/curated.md b/curated.md index 5c500b8e..b44a0662 100644 --- a/curated.md +++ b/curated.md @@ -22,6 +22,9 @@ permalink: /curated/ - [The Lubridate Package](http://www.jstatsoft.org/v40/i03/paper) - [Google Developers R Programming Video Lectures](http://www.r-bloggers.com/google-developers-r-programming-video-lectures/) +### Probability and Statistics + +- [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/) ### GitHub diff --git a/statinf.md b/statinf.md index cb79fbb6..c5a85435 100644 --- a/statinf.md +++ b/statinf.md @@ -6,7 +6,6 @@ permalink: /statinf/ - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) -- [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/) ## Comprehensive Notes From 96a2637e8006bbd73ff3a69903437f37281e72ab Mon Sep 17 00:00:00 2001 From: "Edgar S. Hurtado" Date: Fri, 24 Apr 2015 19:11:18 +0200 Subject: [PATCH 14/82] add link to post about how to use bash commands --- toolbox.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/toolbox.md b/toolbox.md index 67cb050f..92d89f02 100644 --- a/toolbox.md +++ b/toolbox.md @@ -6,6 +6,8 @@ permalink: /toolbox/ ## Command Line +- [Working with files in Bash](http://edgarsh.es/ins/working-with-files-in-bash/) + ## Git/GitHub - [Git & GitHub Video Playlist](https://www.youtube.com/playlist?list=PL5-da3qGB5IBLMp7LtN8Nc3Efd4hJq0kD) (also available for [download](https://drive.google.com/folderview?id=0BxRfg0msVmAoRlZFQjJ3T3VTOUE&usp=sharing) as mp4 files) From 130e725f94838f003ca9330b84e18763a3e5c8f7 Mon Sep 17 00:00:00 2001 From: Daniele Pigni Date: Sun, 26 Apr 2015 15:38:48 +0800 Subject: [PATCH 15/82] Tutorial for those struggling with PA2 --- rprog.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rprog.md b/rprog.md index 398aeb3e..360ddc35 100644 --- a/rprog.md +++ b/rprog.md @@ -8,6 +8,7 @@ permalink: /rprog/ - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) - [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) +- [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) @@ -21,4 +22,4 @@ permalink: /rprog/ ## Comprehensive Notes -- Complete notes for [R Programming](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file +- Complete notes for [R Programming](http://sux13.github.io/DataScienceSpCourseNotes/) From 71e6a91f9b9e44b3b0a43e816778fcc30cce8605 Mon Sep 17 00:00:00 2001 From: Daniele Pigni Date: Wed, 29 Apr 2015 02:53:17 +0800 Subject: [PATCH 16/82] Tutorial for those struggling with PA3 I have made a quick tutorial with simplified functions to manage small data frame that should give insights on how to manage bigger datas like the hospital csv --- rprog.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rprog.md b/rprog.md index 360ddc35..8539101e 100644 --- a/rprog.md +++ b/rprog.md @@ -7,8 +7,9 @@ permalink: /rprog/ ## Programming Assignments - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) -- [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) - [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) +- [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) +- [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) From bf755514654a3ec03fa92d98d0618aafe2c2ae70 Mon Sep 17 00:00:00 2001 From: elmerehbi Date: Thu, 30 Apr 2015 01:53:03 +0300 Subject: [PATCH 17/82] added awesome lists for R & ML --- curated.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/curated.md b/curated.md index b44a0662..87df403f 100644 --- a/curated.md +++ b/curated.md @@ -21,6 +21,8 @@ permalink: /curated/ - [Try R](http://tryr.codeschool.com/) - [The Lubridate Package](http://www.jstatsoft.org/v40/i03/paper) - [Google Developers R Programming Video Lectures](http://www.r-bloggers.com/google-developers-r-programming-video-lectures/) +- [awesome R](https://github.com/qinwf/awesome-R) - A curated list of awesome R frameworks, packages and software. +- [awesome machine learning](https://github.com/josephmisiti/awesome-machine-learning#r) - A curated list of awesome Machine Learning frameworks, libraries and software. ### Probability and Statistics From 8ed4584d5e6956e2f3caedb1ee954eb09300681b Mon Sep 17 00:00:00 2001 From: Aratinga Date: Sun, 3 May 2015 18:45:48 -0400 Subject: [PATCH 18/82] Additions to Curated page --- curated.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/curated.md b/curated.md index 017d64f3..b5c895c7 100644 --- a/curated.md +++ b/curated.md @@ -10,6 +10,12 @@ permalink: /curated/ - [Diving Into Data Science Flipboard](https://flipboard.com/@thiakx/diving-into-data-science-5823ectuy) - [OLAP Operation in R](http://architects.dzone.com/articles/olap-operation-r) - [Journal of Statistical Software: Tidy data](http://www.jstatsoft.org/v59/i10/paper) +- [Verzani: simpleR – Using R for Introductory Statistics](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf/) +- [Data Visualization packages](http://www.datavis.ca/R/) +- [Visualization hints: plotting numeric data by groups](http://www.r-bloggers.com/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/) +- [Matrix rotation for image and contour plots in R](http://blog.snap.uaf.edu/2012/06/08/matrix-rotation-for-image-and-contour-plots-in-r/) +- [Fig Data: 11 Tips on How to Handle Big Data in R (and 1 Bad Pun)](http://theodi.org/blog/fig-data-11-tips-how-handle-big-data-r-and-1-bad-pun) +- [Data from 538](https://github.com/fivethirtyeight/data) ### Command Line @@ -19,8 +25,19 @@ permalink: /curated/ ### R - [Try R](http://tryr.codeschool.com/) +- [The R Book by Michael J. Crawley](https://archive.org/details/TheRBook/) +- [Univ. of Calif. Riverside R Programming](http://manuals.bioinformatics.ucr.edu/home/programming-in-r#TOC-R-Basics) +- [G. Sanchez - Strings in R](http://gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf) - [The Lubridate Package](http://www.jstatsoft.org/v40/i03/paper) - [Google Developers R Programming Video Lectures](http://www.r-bloggers.com/google-developers-r-programming-video-lectures/) +- [Google's R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml) +- [Tufte-style HTML in rmarkdown](http://sachsmc.github.io/tufterhandout/) +- [Creating an R Package](http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/) +- [Beautiful ggplot2 Cheatsheet](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/) +- [Intro to Graphics](http://bcb.dfci.harvard.edu/~aedin/courses/Bioconductor/2.Plotting.pdf) +- [data.table cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf) +- [Exploratory Data Analysis with data.table](http://varianceexplained.org/RData/lessons/lesson4/) +- [Fast summary statistics in R with data.table](http://blog.yhathq.com/posts/fast-summary-statistics-with-data-dot-table.html) ### GitHub @@ -31,10 +48,14 @@ permalink: /curated/ - [GitHub - Dealing with Multiple Accounts](http://hmkcode.com/git-tutorial/how-to-deal-with-multiple-github-accounts-on-one-computer/) - [Try Git](https://try.github.io/levels/1/challenges/1) - [Learn Git Branching: Interactive Game](http://pcottle.github.com/learnGitBranching/) +- [Atlassian Git Tutorials - Branches](https://www.atlassian.com/git/tutorials/using-branches/) ### Reproducible Research - [Markdown live demo](http://markdown-here.com/livedemo.html) - [Boosting Slides by Ron Meir] (https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) +- +### Machine Learning +- [UC Irvine Machine Learning Data Repository](http://archive.ics.uci.edu/ml/) ### Textbooks - [OpenIntro textbook](https://www.openintro.org/stat/textbook.php) From 2145f688de8593a151dbde612d1c3a8f1218dc14 Mon Sep 17 00:00:00 2001 From: Aratinga Date: Sun, 3 May 2015 18:48:01 -0400 Subject: [PATCH 19/82] Fixed one line --- curated.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/curated.md b/curated.md index b5c895c7..be2bd4f6 100644 --- a/curated.md +++ b/curated.md @@ -52,8 +52,8 @@ permalink: /curated/ ### Reproducible Research - [Markdown live demo](http://markdown-here.com/livedemo.html) -- [Boosting Slides by Ron Meir] (https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) -- +- [Boosting Slides by Ron Meir](https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) + ### Machine Learning - [UC Irvine Machine Learning Data Repository](http://archive.ics.uci.edu/ml/) From fe6a2805cba19710bb4541ee63757ee32d8e2113 Mon Sep 17 00:00:00 2001 From: rchampoux Date: Tue, 12 May 2015 10:54:15 -0400 Subject: [PATCH 20/82] Added link to rprog.md for alternative submit script --- rprog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rprog.md b/rprog.md index 8539101e..42e19df4 100644 --- a/rprog.md +++ b/rprog.md @@ -11,6 +11,7 @@ permalink: /rprog/ - [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) - [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) +- [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) ## R Language From f11116aadb95676287aee90896d5746c82ffa834 Mon Sep 17 00:00:00 2001 From: Randall Shane Date: Fri, 22 May 2015 13:52:47 -0400 Subject: [PATCH 21/82] RPub NOAA Data This is an RPub I posted that uses current NOAA data and goes through a rigorous exercise enforcing data integrity. --- repres.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/repres.md b/repres.md index 20e340dd..5fc1ac8e 100644 --- a/repres.md +++ b/repres.md @@ -7,7 +7,8 @@ permalink: /repres/ - [Turning a RPubs document into a Github website walkthrough](https://github.com/thoughtfulbloke/appleorange) - [Introduction to knitr with rmarkdown](https://sachsmc.github.io/knit-git-markr-guide/knitr/knit.html) - [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) -- [Benefit-cost analysis of a park user fee](https://rstudio-pubs-static.s3.amazonaws.com/72135_dc45211d976842c2a9a8c8b5f2472ff0.html) +- [Benefit-cost analysis of a park user fee](https://rstudio-pubs-static.s3.amazonaws.com/72135_dc45211d976842c2a9a8c8b5f2472ff0.html) +- [Data Lake Integrity](http://rpubs.com/rshane/81297) ## Comprehensive Notes From 7d983f1b068470470ec8e2ee9aed2fead85309cf Mon Sep 17 00:00:00 2001 From: Piyush Agarwal Date: Tue, 2 Jun 2015 13:48:51 -0700 Subject: [PATCH 22/82] Updated eda.md to add new link Added link to blog post showing how to use data analysis with Twitter API and Python --- eda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/eda.md b/eda.md index 7bcf339c..4a0708ec 100644 --- a/eda.md +++ b/eda.md @@ -4,9 +4,9 @@ title: Exploratory Data Analysis permalink: /eda/ --- - [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph) - - [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) - [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) +- [Data Analysis using Twitter API and Python](http://blog.impiyush.me/2015/03/data-analysis-using-twitter-api-and.html) ## Comprehensive Notes From f0d6621e4d2377572e2cc4d5a18c57f64a660006 Mon Sep 17 00:00:00 2001 From: Isvaldo Fernandes de Souza Date: Thu, 4 Jun 2015 14:45:38 -0300 Subject: [PATCH 23/82] Update curated.md Add way to save R files online, it's nice to shared R files --- curated.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/curated.md b/curated.md index b0ac4dc1..0af7b922 100644 --- a/curated.md +++ b/curated.md @@ -40,7 +40,7 @@ permalink: /curated/ - [data.table cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf) - [Exploratory Data Analysis with data.table](http://varianceexplained.org/RData/lessons/lesson4/) - [Fast summary statistics in R with data.table](http://blog.yhathq.com/posts/fast-summary-statistics-with-data-dot-table.html) - +- [R online in r-fiddle.org] (http://www.r-fiddle.org/) ### Probability and Statistics - [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/) From b3a1862c40660c67c8cce0d2891f49b82504978f Mon Sep 17 00:00:00 2001 From: Isvaldo Fernandes de Souza Date: Thu, 4 Jun 2015 14:54:42 -0300 Subject: [PATCH 24/82] Update curated.md fixed space --- curated.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/curated.md b/curated.md index 0af7b922..e475b315 100644 --- a/curated.md +++ b/curated.md @@ -40,7 +40,8 @@ permalink: /curated/ - [data.table cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf) - [Exploratory Data Analysis with data.table](http://varianceexplained.org/RData/lessons/lesson4/) - [Fast summary statistics in R with data.table](http://blog.yhathq.com/posts/fast-summary-statistics-with-data-dot-table.html) -- [R online in r-fiddle.org] (http://www.r-fiddle.org/) +- [R online in r-fiddle.org](http://www.r-fiddle.org/) + ### Probability and Statistics - [Probability and Statistics Cookbook](http://matthias.vallentin.net/probability-and-statistics-cookbook/) From 0f69e8be5d07a3b18c25cf2ac6a5d252de74d380 Mon Sep 17 00:00:00 2001 From: seankross Date: Tue, 9 Jun 2015 14:31:25 -0400 Subject: [PATCH 25/82] clean up eda --- eda.md | 1 + 1 file changed, 1 insertion(+) diff --git a/eda.md b/eda.md index 4a0708ec..8e179acb 100644 --- a/eda.md +++ b/eda.md @@ -3,6 +3,7 @@ layout: page title: Exploratory Data Analysis permalink: /eda/ --- + - [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph) - [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) - [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) From 54dc7a80ba6ec59f8948f331def5859d3b76e655 Mon Sep 17 00:00:00 2001 From: larspijnappel Date: Wed, 17 Jun 2015 15:56:01 +0200 Subject: [PATCH 26/82] Update curated.md Removed the last slash (after the pdf) from the Verzani link. It caused an 404 error. --- curated.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/curated.md b/curated.md index e475b315..7b9db863 100644 --- a/curated.md +++ b/curated.md @@ -10,7 +10,7 @@ permalink: /curated/ - [Diving Into Data Science Flipboard](https://flipboard.com/@thiakx/diving-into-data-science-5823ectuy) - [OLAP Operation in R](http://architects.dzone.com/articles/olap-operation-r) - [Journal of Statistical Software: Tidy data](http://www.jstatsoft.org/v59/i10/paper) -- [Verzani: simpleR – Using R for Introductory Statistics](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf/) +- [Verzani: simpleR – Using R for Introductory Statistics](http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf) - [Data Visualization packages](http://www.datavis.ca/R/) - [Visualization hints: plotting numeric data by groups](http://www.r-bloggers.com/visualization-series-insight-from-cleveland-and-tufte-on-plotting-numeric-data-by-groups/) - [Matrix rotation for image and contour plots in R](http://blog.snap.uaf.edu/2012/06/08/matrix-rotation-for-image-and-contour-plots-in-r/) From e8ae917490bad0d4ce6fe6d66bd5f7d918537344 Mon Sep 17 00:00:00 2001 From: Homer White Date: Thu, 9 Jul 2015 14:00:35 -0400 Subject: [PATCH 27/82] add link to Shiny simulation tutorial The Tutorial is an interactive R Markdown document, currently hosted on shinyapps.io. It aims to take the reader step-by-step through the construction of a reasonably full-featured simulation app that lets statistics students explore, through simulation, the coverage properties of the classical t-intervals for a population mean. After completing the tutorial the reader will be able to write his/her own simulation apps---hopefully having been spared some of the struggle that I went through when I first learned Shiny in the Spring of 2014. --- ddp.md | 1 + 1 file changed, 1 insertion(+) diff --git a/ddp.md b/ddp.md index a517b609..45a56d9c 100644 --- a/ddp.md +++ b/ddp.md @@ -8,6 +8,7 @@ permalink: /ddp/ - [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides/) - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) +- [Tutorial on writing Shiny simulation apps](http://homer.shinyapps.io/sim_tutorial_Rmd) ## Comprehensive Notes From ca0fa60117b85ddb8fa7818e1520c5261ca7d3f4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mateusz=20So=C5=82tysik?= Date: Sat, 11 Jul 2015 09:17:36 +0200 Subject: [PATCH 28/82] Update toolbox.md --- toolbox.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/toolbox.md b/toolbox.md index f8e06208..559d1ad6 100644 --- a/toolbox.md +++ b/toolbox.md @@ -7,6 +7,7 @@ permalink: /toolbox/ ## Command Line - [Working with files in Bash](http://edgarsh.es/ins/working-with-files-in-bash/) +- [Mastering the command line, in one page](https://github.com/jlevy/the-art-of-command-line/blob/master/README.md) ## Git/GitHub @@ -17,4 +18,5 @@ permalink: /toolbox/ - [Github Repo Tutorial How to fork a repo, download it to your local drive and commit changes ](https://www.youtube.com/watch?v=MY94AIplcaU) ## Comprehensive Notes -- Complete notes for [The Data Scientist's Toolbox](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file + +- Complete notes for [The Data Scientist's Toolbox](http://sux13.github.io/DataScienceSpCourseNotes/) From ca48b7e7bb6c13d61fbf35f6fc8a7987bd31dc02 Mon Sep 17 00:00:00 2001 From: seankross Date: Mon, 13 Jul 2015 13:57:01 -0400 Subject: [PATCH 29/82] added CL to curated --- curated.md | 1 + toolbox.md | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/curated.md b/curated.md index 7b9db863..779fb08a 100644 --- a/curated.md +++ b/curated.md @@ -21,6 +21,7 @@ permalink: /curated/ - [explainshell.com - match command-line arguments to their help text](http://explainshell.com/) - [The Command Line Crash Course - Quick course in using the command line](http://cli.learncodethehardway.org/book/) +- [Mastering the command line, in one page](https://github.com/jlevy/the-art-of-command-line/blob/master/README.md) ### R diff --git a/toolbox.md b/toolbox.md index 559d1ad6..ed0755ac 100644 --- a/toolbox.md +++ b/toolbox.md @@ -7,7 +7,6 @@ permalink: /toolbox/ ## Command Line - [Working with files in Bash](http://edgarsh.es/ins/working-with-files-in-bash/) -- [Mastering the command line, in one page](https://github.com/jlevy/the-art-of-command-line/blob/master/README.md) ## Git/GitHub From b62cea84e44c263bae49fc5a0e579bf776770e54 Mon Sep 17 00:00:00 2001 From: Gustavo Paterno Date: Thu, 6 Aug 2015 10:14:09 +0200 Subject: [PATCH 30/82] add link to Hadley online book: R Packages http://r-pkgs.had.co.nz/ --- curated.md | 1 + 1 file changed, 1 insertion(+) diff --git a/curated.md b/curated.md index 779fb08a..33ff6ca7 100644 --- a/curated.md +++ b/curated.md @@ -36,6 +36,7 @@ permalink: /curated/ - [Google's R Style Guide](https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml) - [Tufte-style HTML in rmarkdown](http://sachsmc.github.io/tufterhandout/) - [Creating an R Package](http://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/) +- [R Packages (Hadley online book)](http://r-pkgs.had.co.nz/) - How to write your own R packages. - [Beautiful ggplot2 Cheatsheet](http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/) - [Intro to Graphics](http://bcb.dfci.harvard.edu/~aedin/courses/Bioconductor/2.Plotting.pdf) - [data.table cheat sheet](https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf) From 03187ea66d90c78cf304049b46a941eae61cba66 Mon Sep 17 00:00:00 2001 From: James Elford Date: Sun, 9 Aug 2015 16:51:02 +0100 Subject: [PATCH 31/82] Remove add from Other Resources page Data Science Appliance is an add website; it redirects through doubleclick.net --- other.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/other.md b/other.md index 76f94205..a36f6c2f 100644 --- a/other.md +++ b/other.md @@ -22,8 +22,6 @@ permalink: /other/ ## Pre-built virtual machines for R development. - [Here's a pre-built lightweight Linux machine with R and RStudio already installed](https://github.com/queirozfcom/r-box). You just need to install [vagrant](https://www.vagrantup.com/downloads.html), download (or clone) the github repository and you'll get a clean ubuntu machine with the tools you'll need for the assignments. -- [Data Science Appliance](http://datascienceappliance.com/) - A perfectly provisioned virtual machine for data scientists. - - [Data Science Toolbox](http://datasciencetoolbox.org/) - A virtual environment that allows you to start doing data science in a matter of minutes. -- [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup \ No newline at end of file +- [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup From 3e1ae39937f8d6efd45e9168617df9a81cccd084 Mon Sep 17 00:00:00 2001 From: Sean Kross Date: Thu, 13 Aug 2015 13:21:27 -0400 Subject: [PATCH 32/82] Update ddp.md --- ddp.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ddp.md b/ddp.md index 45a56d9c..a233f129 100644 --- a/ddp.md +++ b/ddp.md @@ -8,7 +8,7 @@ permalink: /ddp/ - [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides/) - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) -- [Tutorial on writing Shiny simulation apps](http://homer.shinyapps.io/sim_tutorial_Rmd) +- [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) ## Comprehensive Notes From 8bc14a8be11236de4dae2ae5ec242e55a8fc338a Mon Sep 17 00:00:00 2001 From: Flavio Barros Date: Mon, 24 Aug 2015 11:21:48 -0300 Subject: [PATCH 33/82] Update other.md Showing how to dockerize Shiny Apps and share it on your own server or on docker hub. --- other.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/other.md b/other.md index a36f6c2f..738a8dd8 100644 --- a/other.md +++ b/other.md @@ -16,6 +16,11 @@ permalink: /other/ - [gitignore template for R](https://github.com/github/gitignore/blob/master/R.gitignore) (source:[gitignore](https://github.com/github/gitignore)) - [Github Help - Using Git / Ignoring files](https://help.github.com/articles/ignoring-files/) +### Deploying and sharing Shiny Apps with Docker +- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) + ## Troubleshooting - [Windows batch file to work around RStudio startup issues](https://github.com/stepds/contrib-DataScienceSpecialization/blob/master/README.md) From 1243a0a544e275fa46e1c7741327e866cc2a78e5 Mon Sep 17 00:00:00 2001 From: Flavio Barros Date: Mon, 24 Aug 2015 11:24:14 -0300 Subject: [PATCH 34/82] Update other.md Add content about ways of sharing and deploying Shiny Apps. --- other.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/other.md b/other.md index 738a8dd8..2bdd10ee 100644 --- a/other.md +++ b/other.md @@ -16,11 +16,6 @@ permalink: /other/ - [gitignore template for R](https://github.com/github/gitignore/blob/master/R.gitignore) (source:[gitignore](https://github.com/github/gitignore)) - [Github Help - Using Git / Ignoring files](https://help.github.com/articles/ignoring-files/) -### Deploying and sharing Shiny Apps with Docker -- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) -- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) -- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) - ## Troubleshooting - [Windows batch file to work around RStudio startup issues](https://github.com/stepds/contrib-DataScienceSpecialization/blob/master/README.md) @@ -30,3 +25,8 @@ permalink: /other/ - [Data Science Toolbox](http://datasciencetoolbox.org/) - A virtual environment that allows you to start doing data science in a matter of minutes. - [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup + +## Deploying and sharing Shiny Apps with Docker +- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) From 2c99cc60ec996509a56bb4f71bf5640cf4b03c9b Mon Sep 17 00:00:00 2001 From: seankross Date: Tue, 25 Aug 2015 10:11:42 -0400 Subject: [PATCH 35/82] moved from other the ddp --- ddp.md | 5 +++++ other.md | 5 ----- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/ddp.md b/ddp.md index a233f129..bf9419f0 100644 --- a/ddp.md +++ b/ddp.md @@ -6,9 +6,14 @@ permalink: /ddp/ - [Slidify to Github walkthrough](http://rpubs.com/thoughtfulbloke/25103) - [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides/) + +## Shiny - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) - [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) +- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) ## Comprehensive Notes diff --git a/other.md b/other.md index 2bdd10ee..a36f6c2f 100644 --- a/other.md +++ b/other.md @@ -25,8 +25,3 @@ permalink: /other/ - [Data Science Toolbox](http://datasciencetoolbox.org/) - A virtual environment that allows you to start doing data science in a matter of minutes. - [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup - -## Deploying and sharing Shiny Apps with Docker -- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) -- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) -- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) From 89e112d3ac15b093db86434d5b1f4e12f3aa554f Mon Sep 17 00:00:00 2001 From: lgreski Date: Sun, 6 Sep 2015 23:04:04 -0700 Subject: [PATCH 36/82] Added link to 'real world example' reading ACS 2000 PUMS data. --- getclean.md | 1 + 1 file changed, 1 insertion(+) diff --git a/getclean.md b/getclean.md index 2cf61f5b..72efa8c8 100644 --- a/getclean.md +++ b/getclean.md @@ -14,6 +14,7 @@ permalink: /getclean/ - [Query string (and other fields-within-fields) unrolling](http://rpubs.com/schnee/32988) - [Pre-processing Excel files before loading them into R](https://github.com/alkashef/cleaningexceldata) - [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41) +- ["Real world" example - reading American Community Survey 2000 PUMS Data:](https://github.com/lgreski/acsexample) Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file. ## Comprehensive Notes From 7a38b36d24bfae3b1483508f7f22b341058c34c6 Mon Sep 17 00:00:00 2001 From: thoughtfulbloke Date: Wed, 9 Sep 2015 14:13:49 +1200 Subject: [PATCH 37/82] Added David Hood advice for the course --- getclean.md | 1 + 1 file changed, 1 insertion(+) diff --git a/getclean.md b/getclean.md index 72efa8c8..e2b8ce28 100644 --- a/getclean.md +++ b/getclean.md @@ -15,6 +15,7 @@ permalink: /getclean/ - [Pre-processing Excel files before loading them into R](https://github.com/alkashef/cleaningexceldata) - [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41) - ["Real world" example - reading American Community Survey 2000 PUMS Data:](https://github.com/lgreski/acsexample) Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file. +- [18 Months of CTA advice](https://thoughtfulbloke.wordpress.com/2015/08/31/hello-world) ## Comprehensive Notes From 8465fb8edc2ae558886ff4232ead341c14aa5fb1 Mon Sep 17 00:00:00 2001 From: Sean Kross Date: Mon, 26 Oct 2015 21:08:23 -0400 Subject: [PATCH 38/82] Update ddp.md --- ddp.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ddp.md b/ddp.md index bf9419f0..951724af 100644 --- a/ddp.md +++ b/ddp.md @@ -11,7 +11,7 @@ permalink: /ddp/ - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) - [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) -- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) +- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) - [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) - [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) From 816edbe647bee4b3d0346591574cc1ba900afeff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carsten=20J=C3=B8rgensen?= Date: Thu, 29 Oct 2015 12:06:02 +0100 Subject: [PATCH 39/82] Added link to http://reproducibleresearch.net/ This website contains lots of useful material about reproducible research. --- curated.md | 1 + 1 file changed, 1 insertion(+) diff --git a/curated.md b/curated.md index 33ff6ca7..613c5f4e 100644 --- a/curated.md +++ b/curated.md @@ -61,6 +61,7 @@ permalink: /curated/ ### Reproducible Research - [Markdown live demo](http://markdown-here.com/livedemo.html) - [Boosting Slides by Ron Meir](https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) +- [Reproducible Research website](http://reproducibleresearch.net/) ### Machine Learning - [UC Irvine Machine Learning Data Repository](http://archive.ics.uci.edu/ml/) From cf21437e3c8541dd7363ca5f02ff3ac6a435049d Mon Sep 17 00:00:00 2001 From: Len Greski Date: Mon, 30 Nov 2015 18:30:23 -0800 Subject: [PATCH 40/82] Added an article containing step by step instructions for using Github Pages with RStudio for the PML project. --- pml.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/pml.md b/pml.md index be4defc3..f2cac02d 100644 --- a/pml.md +++ b/pml.md @@ -25,3 +25,7 @@ permalink: /pml/ ## Comprehensive Notes - Complete notes for [Practical Machine Learning](http://sux13.github.io/DataScienceSpCourseNotes/) + +## Configuring Github Pages with RStudio for PML Project + +- Step by step instructions to [Configure Github Pages with RStudio](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-ghPagesSetup.md) to support the PML course project. From 21b4c2dd22c15474a49392dfae3d555cf086fb29 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Tue, 1 Dec 2015 16:48:27 -0800 Subject: [PATCH 41/82] Added links for Configuring RStudio to work with Git / Github, Mac and Windows versions, plus Using Editor Modes in Coursera Discussion Forums. --- toolbox.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/toolbox.md b/toolbox.md index ed0755ac..3c2dfc68 100644 --- a/toolbox.md +++ b/toolbox.md @@ -15,7 +15,12 @@ permalink: /toolbox/ - [Understanding the Relationship Between Git and GitHub](http://www.dataschool.io/github-is-just-dropbox-for-git/) - [Simple Guide to GitHub Forks](http://www.dataschool.io/simple-guide-to-forks-in-github-and-git/) - [Github Repo Tutorial How to fork a repo, download it to your local drive and commit changes ](https://www.youtube.com/watch?v=MY94AIplcaU) +- [Configuring RStudio to work with Git / Github - Mac OSX](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/configureRStudioGitOSXVersion.md) +- [Configuring RStudio to work with Git / Github - Windows](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/configureRStudioGitWindowsVersion.md) ## Comprehensive Notes - Complete notes for [The Data Scientist's Toolbox](http://sux13.github.io/DataScienceSpCourseNotes/) + +## Miscellaneous +- [Using Editor Modes in Coursera Discussion Forum Posts](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/usingMarkdownInForumPosts.md) From 37ae85a7b3092bcd9d417d1ce82dd69accbcf9ce Mon Sep 17 00:00:00 2001 From: Len Greski Date: Wed, 2 Dec 2015 09:51:37 -0800 Subject: [PATCH 42/82] Added articles written to support students in R Programming -- strategy for coding the assignments, grading the SHA-1 has code, Data frame as a list, and 3 articles discussing R and commercial stats packages. --- rprog.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/rprog.md b/rprog.md index 42e19df4..85162862 100644 --- a/rprog.md +++ b/rprog.md @@ -6,22 +6,31 @@ permalink: /rprog/ ## Programming Assignments +- [Strategy for Coding the Programming Assignments](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/makeItRun.md) - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) - [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) - [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) - [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) - [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) +- [Grading the SHA-1 Hash Code](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-gradeSHA1hash.md) ## R Language - [Some notes on the R Language](http://lopezrj.github.io) +- [A Data Frame is Also a List](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataFrameAsList.md) ## R language cheatsheet - [R cheatsheet covering all lectures](https://github.com/startupjing/Tech_Notes/blob/master/R/R_language.md) +## R and Commercial Statistics Packages + +- [Commercial Statistics Packages: An Historical Perspective](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statsPackagesHistory.md) +- [Why is R More Difficult than SAS?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/whyIsRHarderThanSAS.md) +- [SAS Experience: impediment to learning R?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/exampleSortRvsSAS.md) + ## Comprehensive Notes - Complete notes for [R Programming](http://sux13.github.io/DataScienceSpCourseNotes/) From 5d1cf2b335f14cfda25cd40429d18f9af8d4f127 Mon Sep 17 00:00:00 2001 From: Sean Kross Date: Sat, 12 Dec 2015 02:11:50 -0500 Subject: [PATCH 43/82] Update index.md --- index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/index.md b/index.md index 6c035e70..5cc7be2f 100644 --- a/index.md +++ b/index.md @@ -4,7 +4,7 @@ layout: page ## Table of Contents -This is site is meant to serve as a directory for the amazing content the +This site is meant to serve as a directory for the amazing content the community has created around the Data Science Specialization. If you are interested in contributing [click here](https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io#contributing). From be0a48ab6b0aadb403eaff3ee52308868d170280 Mon Sep 17 00:00:00 2001 From: Leonard Greski Date: Thu, 24 Dec 2015 21:02:10 -0800 Subject: [PATCH 44/82] Added two links: makeCacheMatrix as an Object, and S Objects, R Objects, and Lexical Scoping. --- rprog.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rprog.md b/rprog.md index 85162862..f94ac309 100644 --- a/rprog.md +++ b/rprog.md @@ -14,12 +14,15 @@ permalink: /rprog/ - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) - [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) - [Grading the SHA-1 Hash Code](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-gradeSHA1hash.md) +- [Assignment 2: makeCacheMatrix as an Object](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprogAssignment2Prototype.md) ## R Language - [Some notes on the R Language](http://lopezrj.github.io) - [A Data Frame is Also a List](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataFrameAsList.md) +- [S Objects, R Objects, and Lexical Scoping](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-lexicalScoping.md) + ## R language cheatsheet From 0f621d08873493244d46654e19a2dc4f4566bdc6 Mon Sep 17 00:00:00 2001 From: Flavio Barros Date: Sun, 27 Dec 2015 20:26:25 -0700 Subject: [PATCH 45/82] Update ddp.md As I moved some post from www.flaviobarros.net to www.rmining.net I'm updating the file. --- ddp.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ddp.md b/ddp.md index 951724af..6a467ab5 100644 --- a/ddp.md +++ b/ddp.md @@ -12,8 +12,8 @@ permalink: /ddp/ - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) - [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) - [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) -- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) -- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) ## Comprehensive Notes From 608e561d3934fde6645c746701c8b4e72b247552 Mon Sep 17 00:00:00 2001 From: Flavio Barros Date: Sun, 27 Dec 2015 20:29:01 -0700 Subject: [PATCH 46/82] Update other.md As I moved my domain from www.flaviobarros.net to www.rmining.net I'm updating the links. --- other.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/other.md b/other.md index 2bdd10ee..d01d490f 100644 --- a/other.md +++ b/other.md @@ -27,6 +27,6 @@ permalink: /other/ - [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup ## Deploying and sharing Shiny Apps with Docker -- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) -- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) -- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) +- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) From 2035298f25186b4f0f5285a453f0cc016c0f3e37 Mon Sep 17 00:00:00 2001 From: Leonard Greski Date: Sun, 3 Jan 2016 15:49:59 -0600 Subject: [PATCH 47/82] Add link for Improving Runtime Performance of caret::train() article. --- pml.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/pml.md b/pml.md index f2cac02d..0628ebb8 100644 --- a/pml.md +++ b/pml.md @@ -29,3 +29,7 @@ permalink: /pml/ ## Configuring Github Pages with RStudio for PML Project - Step by step instructions to [Configure Github Pages with RStudio](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-ghPagesSetup.md) to support the PML course project. + +## Improving Runtime Performance of Caret + +- Step by step instructions to [implement parallel processing in caret::train()](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-randomForestPerformance.md) on a random forest model, along with runtime performance analysis for a variety of laptops, ranging from an Intel Atom-based tablet to a quad-core i7 processor. From 55d85379df353ece4855ca669caecab6582cbea8 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 9 Jan 2016 06:51:04 -0800 Subject: [PATCH 48/82] Add two articles: Common Mistakes / overwriting R functions with data objects, and Thinking in R versus Thinking in SAS. --- rprog.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rprog.md b/rprog.md index f94ac309..0f3554f8 100644 --- a/rprog.md +++ b/rprog.md @@ -22,6 +22,7 @@ permalink: /rprog/ - [Some notes on the R Language](http://lopezrj.github.io) - [A Data Frame is Also a List](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataFrameAsList.md) - [S Objects, R Objects, and Lexical Scoping](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-lexicalScoping.md) +- [Common R Mistakes: Overwriting Functions with Data Objects](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-overwritingRFunctions.md) ## R language cheatsheet @@ -33,6 +34,7 @@ permalink: /rprog/ - [Commercial Statistics Packages: An Historical Perspective](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statsPackagesHistory.md) - [Why is R More Difficult than SAS?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/whyIsRHarderThanSAS.md) - [SAS Experience: impediment to learning R?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/exampleSortRvsSAS.md) +- [Thinking in R versus Thinking in SAS](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/exampleSortRvsSAS.md) ## Comprehensive Notes From 814f6ad0d8d98d6b024b3ecf7779d28ef4a90296 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 9 Jan 2016 07:02:59 -0800 Subject: [PATCH 49/82] Add five articles related to statinf class. --- statinf.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/statinf.md b/statinf.md index c5a85435..ca1cb346 100644 --- a/statinf.md +++ b/statinf.md @@ -6,6 +6,11 @@ permalink: /statinf/ - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) +- [Exponential Distribution / Central Limit Theorem - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-expDistChecklist.md) +- [ToothGrowth Analysis - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/ToothGrowthChecklist.md) +- [Exploratory Data Analysis in ToothGrowth Assignment](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/edaInToothGrowthAnalysis.md), explaining the exploratory data analysis requirement for students who have not taken the *Exploratory Data Analysis* course prior to taking *Statistical Inference*. +- [Using MathJax with Discussion Forums, R Markdown, and Github Pages](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/mathjaxWithGithubMarkdown.md) +- [Kable Tables with Data Frames](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/kableDataFrameTable.md) illustrates how to display a custom table in a `knitr()` document by creating a data frame to contain the information to be rendered with `kable()`. ## Comprehensive Notes From 3417573bec53a9b872573ec63bb2463677a877ad Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 9 Jan 2016 07:05:21 -0800 Subject: [PATCH 50/82] Add article on configuring shinyapps.io application timeout. --- ddp.md | 1 + 1 file changed, 1 insertion(+) diff --git a/ddp.md b/ddp.md index 6a467ab5..3240b3b3 100644 --- a/ddp.md +++ b/ddp.md @@ -14,6 +14,7 @@ permalink: /ddp/ - [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) - [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) - [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) +- [Shinyapps.io: Configuring Application Timeout](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataProd-shinyTimeoutConfig.md) ## Comprehensive Notes From 1a9c230c7346005f4e8e8d21ed95355d3c5b14a8 Mon Sep 17 00:00:00 2001 From: Paul Adamson Date: Mon, 18 Jan 2016 00:03:22 -0500 Subject: [PATCH 51/82] add link to ProjectTemplate blog post --- repres.md | 1 + 1 file changed, 1 insertion(+) diff --git a/repres.md b/repres.md index 5fc1ac8e..cba776f9 100644 --- a/repres.md +++ b/repres.md @@ -9,6 +9,7 @@ permalink: /repres/ - [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) - [Benefit-cost analysis of a park user fee](https://rstudio-pubs-static.s3.amazonaws.com/72135_dc45211d976842c2a9a8c8b5f2472ff0.html) - [Data Lake Integrity](http://rpubs.com/rshane/81297) +- [ProjectTemplate in RStudio with Git](http://padamson.github.io/r/rstudio/projecttemplate/git/2016/01/17/projecttemplate-in-rstudio-with-git.html) ## Comprehensive Notes From f5c7e090c97645c681071b950997bddc7aa89f17 Mon Sep 17 00:00:00 2001 From: Aaron McAdie Date: Fri, 26 Feb 2016 13:11:10 -0800 Subject: [PATCH 52/82] added link to interactive CI repo --- statinf.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/statinf.md b/statinf.md index c5a85435..1cc82800 100644 --- a/statinf.md +++ b/statinf.md @@ -6,7 +6,7 @@ permalink: /statinf/ - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) - +-[Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) ## Comprehensive Notes - Complete notes for [Statistical Inference](http://sux13.github.io/DataScienceSpCourseNotes/) From 1eb1be9adece106f162b3816161a2b83158090af Mon Sep 17 00:00:00 2001 From: Aaron McAdie Date: Fri, 26 Feb 2016 13:15:04 -0800 Subject: [PATCH 53/82] fixed line break --- statinf.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/statinf.md b/statinf.md index 1cc82800..bee6381e 100644 --- a/statinf.md +++ b/statinf.md @@ -6,7 +6,8 @@ permalink: /statinf/ - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) --[Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) +- [Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) + ## Comprehensive Notes - Complete notes for [Statistical Inference](http://sux13.github.io/DataScienceSpCourseNotes/) From 6f08d7b5a2b2535dd7831e03fd3d0de2b841e952 Mon Sep 17 00:00:00 2001 From: Leonard Greski Date: Sun, 24 Apr 2016 17:43:20 -0400 Subject: [PATCH 54/82] Added Len Greski to list of community contributors. --- about.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/about.md b/about.md index 27bf2391..aa0af257 100644 --- a/about.md +++ b/about.md @@ -19,8 +19,9 @@ The [Data Science Specialization](https://www.coursera.org/specialization/jhudat - [Kevin Markham](http://www.dataschool.io/) - Derek Franks - David Hood +- [Leonard Greski](https://github.com/lgreski) - Michael Sachs - Allan Inocêncio de Souza Costa - [stepds](https://github.com/stepds) - Bastiaan Quast -- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file +- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) From e1da253e0a82b8a507129deadfb23e28eb007e36 Mon Sep 17 00:00:00 2001 From: Leonard Greski Date: Sun, 24 Apr 2016 22:37:01 -0400 Subject: [PATCH 55/82] Added link to MiKTeX install walkthrough on Windows 10. --- statinf.md | 1 + 1 file changed, 1 insertion(+) diff --git a/statinf.md b/statinf.md index ca1cb346..051fb7ed 100644 --- a/statinf.md +++ b/statinf.md @@ -11,6 +11,7 @@ permalink: /statinf/ - [Exploratory Data Analysis in ToothGrowth Assignment](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/edaInToothGrowthAnalysis.md), explaining the exploratory data analysis requirement for students who have not taken the *Exploratory Data Analysis* course prior to taking *Statistical Inference*. - [Using MathJax with Discussion Forums, R Markdown, and Github Pages](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/mathjaxWithGithubMarkdown.md) - [Kable Tables with Data Frames](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/kableDataFrameTable.md) illustrates how to display a custom table in a `knitr()` document by creating a data frame to contain the information to be rendered with `kable()`. +- [Installing MiKTeK on Windows 10 / Generate a PDF from knitr](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-generatePDF.md) ## Comprehensive Notes From d6132aba9f57d175e29a8c032cab97775410566e Mon Sep 17 00:00:00 2001 From: Leonard Greski Date: Sat, 7 May 2016 11:58:12 -0400 Subject: [PATCH 56/82] Add article describing "optimal" sample size relative to power calculations. --- statinf.md | 1 + 1 file changed, 1 insertion(+) diff --git a/statinf.md b/statinf.md index ac5c9e9f..1157c59f 100644 --- a/statinf.md +++ b/statinf.md @@ -13,6 +13,7 @@ permalink: /statinf/ - [Kable Tables with Data Frames](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/kableDataFrameTable.md) illustrates how to display a custom table in a `knitr()` document by creating a data frame to contain the information to be rendered with `kable()`. - [Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) - [Installing MiKTeK on Windows 10 / Generate a PDF from knitr](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-generatePDF.md) +- [Power calculations: optimal szmple size](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-optimalSampleSize.md) ## Comprehensive Notes From 925de9276c17799264d54cd3a97b975fe0e4f068 Mon Sep 17 00:00:00 2001 From: Leonard Greski Date: Sun, 15 May 2016 19:02:27 -0400 Subject: [PATCH 57/82] Add R Onboarding for SAS Users --- rprog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rprog.md b/rprog.md index 0f3554f8..524517cd 100644 --- a/rprog.md +++ b/rprog.md @@ -31,6 +31,7 @@ permalink: /rprog/ ## R and Commercial Statistics Packages +- [R Onboarding for SAS Users](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-onboardingForSASUsers.md) Provides an overview and links to a variety of resources to help people with SAS experience make the transition to R - [Commercial Statistics Packages: An Historical Perspective](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statsPackagesHistory.md) - [Why is R More Difficult than SAS?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/whyIsRHarderThanSAS.md) - [SAS Experience: impediment to learning R?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/exampleSortRvsSAS.md) From bf50b365586f7a5ffd02527fb97fe0096f0ded76 Mon Sep 17 00:00:00 2001 From: lgreski Date: Sun, 29 May 2016 22:47:21 -0400 Subject: [PATCH 58/82] Add article on forms of the Extract Operator --- rprog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rprog.md b/rprog.md index 524517cd..a5ee9afb 100644 --- a/rprog.md +++ b/rprog.md @@ -23,6 +23,7 @@ permalink: /rprog/ - [A Data Frame is Also a List](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataFrameAsList.md) - [S Objects, R Objects, and Lexical Scoping](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-lexicalScoping.md) - [Common R Mistakes: Overwriting Functions with Data Objects](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-overwritingRFunctions.md) +- [Forms of the Extract Operator](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-extractOperator.md) ## R language cheatsheet From 0223ab1b46adac26e2ddb1a75171efd9dfb17a4a Mon Sep 17 00:00:00 2001 From: lgreski Date: Fri, 17 Jun 2016 20:21:14 -0400 Subject: [PATCH 59/82] Added article explaining use of binomial theorem in Combining Predictors lecture. --- pml.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/pml.md b/pml.md index 0628ebb8..1054002d 100644 --- a/pml.md +++ b/pml.md @@ -22,9 +22,10 @@ permalink: /pml/ - [Comparing Supervised Learning Algorithms](http://www.dataschool.io/comparing-supervised-learning-algorithms/): Comparing 8 common supervised learning algorithms (for regression and classification) on 13 different dimensions. -## Comprehensive Notes +## Content Related to the Lectures - Complete notes for [Practical Machine Learning](http://sux13.github.io/DataScienceSpCourseNotes/) +- [Week 4: Combining Predictors -- Math Explained](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-combiningPredictorsBinomial.md) ## Configuring Github Pages with RStudio for PML Project @@ -32,4 +33,5 @@ permalink: /pml/ ## Improving Runtime Performance of Caret -- Step by step instructions to [implement parallel processing in caret::train()](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-randomForestPerformance.md) on a random forest model, along with runtime performance analysis for a variety of laptops, ranging from an Intel Atom-based tablet to a quad-core i7 processor. +- Step by step instructions to [implement parallel processing in caret::train()](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-randomForestPerformance.md) on a random forest model, along with runtime performance analysis for a variety of laptops, ranging from an Intel Atom-based tablet to a quad-core i7 processor. + From 318b3c1217755fc2ecfde5aeec8fb78f24486432 Mon Sep 17 00:00:00 2001 From: lgreski Date: Fri, 17 Jun 2016 21:14:16 -0400 Subject: [PATCH 60/82] add 2 articles: breaking down pollutantmean, and a SAS version of pollutantmean? --- rprog.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rprog.md b/rprog.md index a5ee9afb..80e56ac4 100644 --- a/rprog.md +++ b/rprog.md @@ -8,6 +8,8 @@ permalink: /rprog/ - [Strategy for Coding the Programming Assignments](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/makeItRun.md) - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) +- [Breaking Down pollutantmean](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-discussPollutantmean.md) +- [A SAS Version of pollutantmean?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-pollutantmeanSASVersion.md) - [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) - [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) - [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) From ea32fb68c6c13fb67e43fbf17f5ade4ec7d30d26 Mon Sep 17 00:00:00 2001 From: Andrew Voshchevoz Date: Mon, 20 Jun 2016 18:02:45 +0300 Subject: [PATCH 61/82] Updated other.md Updated other.md to include link on HTTP/HTTPS proxy configuration guide --- other.md | 1 + 1 file changed, 1 insertion(+) diff --git a/other.md b/other.md index d01d490f..ddb49135 100644 --- a/other.md +++ b/other.md @@ -11,6 +11,7 @@ permalink: /other/ - [Installing Some Basic R Packages in Ubuntu; Ibrahim El Merehbi](http://elmerehbi.wordpress.com/2014/09/09/installing-some-basic-r-packages-in-ubuntu) - [Using Projects in RStudio](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) - [Using Version Control with RStudio](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN) +- [Using R behind HTTP/HTTPS Proxy](https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy) ### Ignoring R & RStudio files - [gitignore template for R](https://github.com/github/gitignore/blob/master/R.gitignore) (source:[gitignore](https://github.com/github/gitignore)) From 4a18de5fe20a16fefe10bbd0eeb8bb11d051de85 Mon Sep 17 00:00:00 2001 From: lgreski Date: Mon, 4 Jul 2016 08:22:49 -0400 Subject: [PATCH 62/82] Add article on permutation tests. --- statinf.md | 1 + 1 file changed, 1 insertion(+) diff --git a/statinf.md b/statinf.md index 1157c59f..5ea0baa6 100644 --- a/statinf.md +++ b/statinf.md @@ -14,6 +14,7 @@ permalink: /statinf/ - [Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) - [Installing MiKTeK on Windows 10 / Generate a PDF from knitr](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-generatePDF.md) - [Power calculations: optimal szmple size](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-optimalSampleSize.md) +- [Permutation Tests Explained](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-permutationTests.md) ## Comprehensive Notes From 53c1f9e8d26f96da4e4013cfb41fe314a538457a Mon Sep 17 00:00:00 2001 From: lgreski Date: Tue, 9 Aug 2016 22:53:58 -0400 Subject: [PATCH 63/82] Add url for Demystifying makeVector() article. --- rprog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rprog.md b/rprog.md index 80e56ac4..35e8e0d8 100644 --- a/rprog.md +++ b/rprog.md @@ -16,6 +16,7 @@ permalink: /rprog/ - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) - [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) - [Grading the SHA-1 Hash Code](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-gradeSHA1hash.md) +- [Assignment 2: Demystifying makeVector](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-breakingDownMakeVector.md) - [Assignment 2: makeCacheMatrix as an Object](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprogAssignment2Prototype.md) From 3c73cb4bc7762bf84e99826c7ebc333b425e9ca1 Mon Sep 17 00:00:00 2001 From: lgreski Date: Sun, 21 Aug 2016 08:54:23 -0400 Subject: [PATCH 64/82] Add article illustrating how to use R to download lecture videos. --- rprog.md | 1 + 1 file changed, 1 insertion(+) diff --git a/rprog.md b/rprog.md index 35e8e0d8..2f6e17fe 100644 --- a/rprog.md +++ b/rprog.md @@ -27,6 +27,7 @@ permalink: /rprog/ - [S Objects, R Objects, and Lexical Scoping](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-lexicalScoping.md) - [Common R Mistakes: Overwriting Functions with Data Objects](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-overwritingRFunctions.md) - [Forms of the Extract Operator](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-extractOperator.md) +- [Creative Use of R: Downloading Course Lectures](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-downloadingLectures.md) Article illustrating how to use R to automate the download of lectures from *Data Science Specialization* courses, such as *R Programming*. Techniques used in this article are helpful to make research reproducible, as required for courses like *Getting and Cleaning Data* and *Reproducible Research*. ## R language cheatsheet From 9308f293a2707f726afc86b288513ba3f5b5f95f Mon Sep 17 00:00:00 2001 From: Devinsuit Date: Tue, 29 Nov 2016 14:13:58 +0300 Subject: [PATCH 65/82] Update broken link #126 --- eda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/eda.md b/eda.md index 8e179acb..a6c52b13 100644 --- a/eda.md +++ b/eda.md @@ -7,7 +7,7 @@ permalink: /eda/ - [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph) - [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) - [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) -- [Data Analysis using Twitter API and Python](http://blog.impiyush.me/2015/03/data-analysis-using-twitter-api-and.html) +- [Data Analysis using Twitter API and Python](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html) ## Comprehensive Notes From 9cbaa239486ac3ca856a6019da246278d0515284 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 7 Jan 2017 20:53:29 -0500 Subject: [PATCH 66/82] Added a "getting started" section, added DSS value proposition article, and converted URLs to bit.ly versions. --- rprog.md | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-) diff --git a/rprog.md b/rprog.md index 2f6e17fe..219011cb 100644 --- a/rprog.md +++ b/rprog.md @@ -1,33 +1,39 @@ --- -layout: page -title: R Programming +title: "R Programming" permalink: /rprog/ +layout: page --- +## Getting Started +- [Resources for R Programming](http://bit.ly/2dhZ8Dy) +- [References for R Programming](http://bit.ly/2b8AxhF) +- [Data Science Specialization Value Proposition](http://bit.ly/2j3EcCn) +- [R Onboarding for SAS Users](http://bit.ly/2dr7yum) + ## Programming Assignments -- [Strategy for Coding the Programming Assignments](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/makeItRun.md) +- [Strategy for Coding the Programming Assignments](http://bit.ly/2ddFh9A) - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) -- [Breaking Down pollutantmean](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-discussPollutantmean.md) -- [A SAS Version of pollutantmean?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-pollutantmeanSASVersion.md) +- [Breaking Down pollutantmean](http://bit.ly/2cHyiCl) +- [A SAS Version of pollutantmean?](http://bit.ly/2d3DR4e) - [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) - [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) - [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) - [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) -- [Grading the SHA-1 Hash Code](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-gradeSHA1hash.md) -- [Assignment 2: Demystifying makeVector](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-breakingDownMakeVector.md) -- [Assignment 2: makeCacheMatrix as an Object](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprogAssignment2Prototype.md) +- [Grading the SHA-1 Hash Code](http://bit.ly/2iUWoB6) +- [Assignment 2: Demystifying makeVector](http://bit.ly/2bTXXfq) +- [Assignment 2: makeCacheMatrix as an Object](http://bit.ly/2byUe4e) ## R Language - [Some notes on the R Language](http://lopezrj.github.io) -- [A Data Frame is Also a List](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataFrameAsList.md) -- [S Objects, R Objects, and Lexical Scoping](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-lexicalScoping.md) -- [Common R Mistakes: Overwriting Functions with Data Objects](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-overwritingRFunctions.md) -- [Forms of the Extract Operator](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-extractOperator.md) -- [Creative Use of R: Downloading Course Lectures](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-downloadingLectures.md) Article illustrating how to use R to automate the download of lectures from *Data Science Specialization* courses, such as *R Programming*. Techniques used in this article are helpful to make research reproducible, as required for courses like *Getting and Cleaning Data* and *Reproducible Research*. +- [A Data Frame is Also a List](http://bit.ly/2fmMRAp) +- [S Objects, R Objects, and Lexical Scoping](http://bit.ly/2dtOSXi) +- [Common R Mistakes: Overwriting Functions with Data Objects](http://bit.ly/2i3gmoA) +- [Forms of the Extract Operator](http://bit.ly/2bzLYTL) +- [Creative Use of R: Downloading Course Lectures](http://bit.ly/2bGlI7R) Article illustrating how to use R to automate the download of lectures from *Data Science Specialization* courses, such as *R Programming*. Techniques used in this article are helpful to make research reproducible, as required for courses like *Getting and Cleaning Data* and *Reproducible Research*. ## R language cheatsheet @@ -36,11 +42,10 @@ permalink: /rprog/ ## R and Commercial Statistics Packages -- [R Onboarding for SAS Users](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-onboardingForSASUsers.md) Provides an overview and links to a variety of resources to help people with SAS experience make the transition to R -- [Commercial Statistics Packages: An Historical Perspective](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statsPackagesHistory.md) -- [Why is R More Difficult than SAS?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/whyIsRHarderThanSAS.md) -- [SAS Experience: impediment to learning R?](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/exampleSortRvsSAS.md) -- [Thinking in R versus Thinking in SAS](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/exampleSortRvsSAS.md) +- [R Onboarding for SAS Users](http://bit.ly/2dr7yum) Provides an overview and links to a variety of resources to help people with SAS experience make the transition to R +- [Commercial Statistics Packages: An Historical Perspective](http://bit.ly/2fPj2qN) +- [Why is R More Difficult than SAS?](http://bit.ly/2erxk3A) +- [Thinking in R versus Thinking in SAS](http://bit.ly/2cH3u8x) ## Comprehensive Notes From 39a027e6006717b7574b749b7e14c606f34c3e8f Mon Sep 17 00:00:00 2001 From: MMohey Date: Tue, 18 Apr 2017 16:27:10 +0200 Subject: [PATCH 67/82] Fixed broken link --- ddp.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ddp.md b/ddp.md index 3240b3b3..895309ef 100644 --- a/ddp.md +++ b/ddp.md @@ -5,7 +5,7 @@ permalink: /ddp/ --- - [Slidify to Github walkthrough](http://rpubs.com/thoughtfulbloke/25103) -- [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides/) +- [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides) ## Shiny - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) From 5db3cfda2d08498b482817dc3ec1dcbffbb9f5db Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 20 May 2017 04:52:49 -0400 Subject: [PATCH 68/82] Add articles related to R programming course --- rprog.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/rprog.md b/rprog.md index 219011cb..47df54d1 100644 --- a/rprog.md +++ b/rprog.md @@ -15,6 +15,7 @@ layout: page - [Strategy for Coding the Programming Assignments](http://bit.ly/2ddFh9A) - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) - [Breaking Down pollutantmean](http://bit.ly/2cHyiCl) +- [Assignment 1: A More Elegant Solution](http://bit.ly/2kwBBlK) - [A SAS Version of pollutantmean?](http://bit.ly/2d3DR4e) - [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) - [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) @@ -23,7 +24,7 @@ layout: page - [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) - [Grading the SHA-1 Hash Code](http://bit.ly/2iUWoB6) - [Assignment 2: Demystifying makeVector](http://bit.ly/2bTXXfq) -- [Assignment 2: makeCacheMatrix as an Object](http://bit.ly/2byUe4e) +- [Assignment 2: makeCacheMatrix as an Object](http://bit.ly/2byUe4e) ## R Language @@ -33,7 +34,11 @@ layout: page - [S Objects, R Objects, and Lexical Scoping](http://bit.ly/2dtOSXi) - [Common R Mistakes: Overwriting Functions with Data Objects](http://bit.ly/2i3gmoA) - [Forms of the Extract Operator](http://bit.ly/2bzLYTL) +- [Functions to Sort Data Frames](http://bit.ly/2dxItzw) - [Creative Use of R: Downloading Course Lectures](http://bit.ly/2bGlI7R) Article illustrating how to use R to automate the download of lectures from *Data Science Specialization* courses, such as *R Programming*. Techniques used in this article are helpful to make research reproducible, as required for courses like *Getting and Cleaning Data* and *Reproducible Research*. +- [Lexical Scoping and Statistical Computing](http://bit.ly/2cmqAPy) Article by Robert Gentleman and Ross Ihaka at the University of Auckland describing how lexical scoping works, and why it is valuable in statistical computing. +- [Data Science Job Report 2017: R Passes SAS, But Python Leaves Them Both Behind](http://bit.ly/2oCHulX) Bob Muenchen's take on the job market for various data science langauges. + ## R language cheatsheet From 497330b024ebfe0bb89c9df644bd5f64a6ff7505 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 20 May 2017 04:58:05 -0400 Subject: [PATCH 69/82] add articles --- getclean.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/getclean.md b/getclean.md index e2b8ce28..6076632d 100644 --- a/getclean.md +++ b/getclean.md @@ -16,6 +16,10 @@ permalink: /getclean/ - [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41) - ["Real world" example - reading American Community Survey 2000 PUMS Data:](https://github.com/lgreski/acsexample) Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file. - [18 Months of CTA advice](https://thoughtfulbloke.wordpress.com/2015/08/31/hello-world) +- [Common Problems: Quiz 1 - Missing Java Runtime](http://bit.ly/2jjtyXM) Explains how to solve the problem of a missing Java Runtime for the question that requires students to process a Microsoft Excel spreadsheet. +- [Strategy for Reading Files & APIs / Quiz 2](http://bit.ly/2e4L5oF) +- [Common Problems: Quiz 2 - sqldf() driver fails to connect](http://bit.ly/2kD2KTY) +- [Tutorial: Downloading Files](http://bit.ly/2iP2suj) Illustrates various ways of downloading files, including binary and text files. ## Comprehensive Notes From 51c9e7672e32aa689e9ca29412699bc8ca857e0f Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 20 May 2017 05:06:20 -0400 Subject: [PATCH 70/82] Add capstone page to index, and content for capstone page --- .gitignore | 1 + index.md | 1 + 2 files changed, 2 insertions(+) diff --git a/.gitignore b/.gitignore index 058dd6c2..d17e5544 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ _site .DS_Store .Rhistory +.Rproj.user diff --git a/index.md b/index.md index 5cc7be2f..761f3e41 100644 --- a/index.md +++ b/index.md @@ -17,6 +17,7 @@ interested in contributing [click here](https://github.com/DataScienceSpecializa 7. [Regression Models](/regmod/) 8. [Practical Machine Learning](/pml/) 9. [Developing Data Products](/ddp/) +10. [Capstone](/capstone/) - [Other Resources](/other/) - [Curated Pages](/curated/) From 4dc95e6f4d52284a801b37f2e2eda1fddb73c163 Mon Sep 17 00:00:00 2001 From: Aaron Date: Tue, 23 May 2017 10:38:39 -0400 Subject: [PATCH 71/82] Added shiny choropleth app code available at https://github.com/amsilvr/shiny_choropleth --- ddp.md | 1 + 1 file changed, 1 insertion(+) diff --git a/ddp.md b/ddp.md index 895309ef..0d39e861 100644 --- a/ddp.md +++ b/ddp.md @@ -8,6 +8,7 @@ permalink: /ddp/ - [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides) ## Shiny +- [Shiny app using leaflet to create a choropleth of all Wireless Emergency Alerts sent through PBS WARN](https://silverman.shinyapps.io/warn_wea/) - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) - [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) From 452a58760df645fc2898a47e75107c67d79ad2c5 Mon Sep 17 00:00:00 2001 From: Aaron Date: Tue, 23 May 2017 13:25:26 -0400 Subject: [PATCH 72/82] Update ddp.md --- ddp.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ddp.md b/ddp.md index 0d39e861..c1a411fc 100644 --- a/ddp.md +++ b/ddp.md @@ -8,7 +8,9 @@ permalink: /ddp/ - [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides) ## Shiny -- [Shiny app using leaflet to create a choropleth of all Wireless Emergency Alerts sent through PBS WARN](https://silverman.shinyapps.io/warn_wea/) +- Choropleth of PBS WARN Distribution of Wireless Emergency Alerts + - [Code for Shiny App](https://github.com/amsilvr/shiny_choropleth) + - [App running on shinyapps.ip](https://silverman.shinyapps.io/warn_wea/) - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) - [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) From 4cd7fc39364e92437e6833b0c97ffe3885ac283f Mon Sep 17 00:00:00 2001 From: Len Greski Date: Tue, 23 May 2017 19:18:31 -0400 Subject: [PATCH 73/82] Add capstone page. --- capstone.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 capstone.md diff --git a/capstone.md b/capstone.md new file mode 100644 index 00000000..57792306 --- /dev/null +++ b/capstone.md @@ -0,0 +1,12 @@ +--- +title: "Capstone" +permalink: /capstone/ +layout: page +--- +## Reference Material + +- [Speech and Language Processing, 3rd Edition](https://web.stanford.edu/~jurafsky/slp3/) Working version of Jurafsky, et. al. book on natural language processing whose content on n-grams is helpful for the capstone. + +## Course Project + +- [n-gram Computations and Computer Capacity](http://bit.ly/2couvxh) Explains the amount of memory required to convert the text files for the course project into n-grams, using the quanteda package. From cf0e2fe24800c5e32e3944b891b4ee1449aca4dd Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 27 May 2017 07:44:44 -0400 Subject: [PATCH 74/82] Add articles to capstone page --- capstone.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/capstone.md b/capstone.md index 57792306..6285e422 100644 --- a/capstone.md +++ b/capstone.md @@ -10,3 +10,5 @@ layout: page ## Course Project - [n-gram Computations and Computer Capacity](http://bit.ly/2couvxh) Explains the amount of memory required to convert the text files for the course project into n-grams, using the quanteda package. +- [Capstone Strategy](http://bit.ly/2rGcgc6) Describes a general strategy to get through the Capstone: use the simplest approaches possible. +- [Choosing a Text Analysis Package](http://bit.ly/2qagsPa) Reviews pros and cons of various R packages used for natural language processing, in the context of requirements for the Capstone project. From ff0aaf40ed3fc52b005ad5ed8379038eff6df746 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 5 Aug 2017 08:53:26 -0400 Subject: [PATCH 75/82] Add article explaining why one cannot calculate the area under a specific point on the normal curve. --- statinf.md | 1 + 1 file changed, 1 insertion(+) diff --git a/statinf.md b/statinf.md index 5ea0baa6..a96df6cb 100644 --- a/statinf.md +++ b/statinf.md @@ -5,6 +5,7 @@ permalink: /statinf/ --- - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) +[CONCEPTS: Calculating Area for a Point on the Normal Curve](http://bit.ly/2hw5AMF) Reviews the mathematics that explain why one cannot calculate the exact proability for a specific value within a distribution for a continuous variable, and illustrates how to calculate a quantile for a point on the curve. - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) - [Exponential Distribution / Central Limit Theorem - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-expDistChecklist.md) - [ToothGrowth Analysis - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/ToothGrowthChecklist.md) From eb0e0249711dad0980532d357524f5759da26da7 Mon Sep 17 00:00:00 2001 From: Len Greski Date: Sat, 19 Aug 2017 16:20:29 -0400 Subject: [PATCH 76/82] Add missing dash in bullet list --- statinf.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/statinf.md b/statinf.md index a96df6cb..19592a27 100644 --- a/statinf.md +++ b/statinf.md @@ -5,7 +5,7 @@ permalink: /statinf/ --- - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) -[CONCEPTS: Calculating Area for a Point on the Normal Curve](http://bit.ly/2hw5AMF) Reviews the mathematics that explain why one cannot calculate the exact proability for a specific value within a distribution for a continuous variable, and illustrates how to calculate a quantile for a point on the curve. +- [CONCEPTS: Calculating Area for a Point on the Normal Curve](http://bit.ly/2hw5AMF) Reviews the mathematics that explain why one cannot calculate the exact proability for a specific value within a distribution for a continuous variable, and illustrates how to calculate a quantile for a point on the curve. - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) - [Exponential Distribution / Central Limit Theorem - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-expDistChecklist.md) - [ToothGrowth Analysis - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/ToothGrowthChecklist.md) From 253d6157bcc02966a0c533cb442c3acff33486e3 Mon Sep 17 00:00:00 2001 From: DocOfi Date: Sun, 21 Jan 2018 13:18:50 +0800 Subject: [PATCH 77/82] added link to my pdf file --- getclean.md | 1 + 1 file changed, 1 insertion(+) diff --git a/getclean.md b/getclean.md index 6076632d..deeccc56 100644 --- a/getclean.md +++ b/getclean.md @@ -20,6 +20,7 @@ permalink: /getclean/ - [Strategy for Reading Files & APIs / Quiz 2](http://bit.ly/2e4L5oF) - [Common Problems: Quiz 2 - sqldf() driver fails to connect](http://bit.ly/2kD2KTY) - [Tutorial: Downloading Files](http://bit.ly/2iP2suj) Illustrates various ways of downloading files, including binary and text files. +- [Creating dataframes from xml data](https://www.dropbox.com/s/7bbzzp4bwsmfl5y/CreatingDataframesfrom%20XmlFiles.odt?dl=0) ## Comprehensive Notes From 5549c1459adcd6bad1d7d81456e09956c0ca2095 Mon Sep 17 00:00:00 2001 From: DocOfi Date: Sun, 21 Jan 2018 13:31:09 +0800 Subject: [PATCH 78/82] Added a link to my presentation in Rpubs --- eda.md | 1 + 1 file changed, 1 insertion(+) diff --git a/eda.md b/eda.md index a6c52b13..8c14e435 100644 --- a/eda.md +++ b/eda.md @@ -8,6 +8,7 @@ permalink: /eda/ - [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) - [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) - [Data Analysis using Twitter API and Python](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html) +- [Exploratory Data Analysis using Flexdashboard](http://rpubs.com/DocOfi/350830) ## Comprehensive Notes From 5a351e34a3068ca320b5759217ff45a9adacb5bf Mon Sep 17 00:00:00 2001 From: DocOfi Date: Sat, 27 Jan 2018 12:47:56 +0800 Subject: [PATCH 79/82] added name and link in about.md --- about.md | 1 + 1 file changed, 1 insertion(+) diff --git a/about.md b/about.md index aa0af257..37ecc9da 100644 --- a/about.md +++ b/about.md @@ -25,3 +25,4 @@ The [Data Science Specialization](https://www.coursera.org/specialization/jhudat - [stepds](https://github.com/stepds) - Bastiaan Quast - [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) +- [Edmund julian Ofilada](https://github.com/DocOfi) From 64aba82b1ebef0cd07d241c83d8cede63f41f617 Mon Sep 17 00:00:00 2001 From: DocOfi Date: Sat, 27 Jan 2018 12:49:15 +0800 Subject: [PATCH 80/82] added link to metricsgraphics tutorial --- eda.md | 1 + 1 file changed, 1 insertion(+) diff --git a/eda.md b/eda.md index 8c14e435..1f56ac70 100644 --- a/eda.md +++ b/eda.md @@ -9,6 +9,7 @@ permalink: /eda/ - [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) - [Data Analysis using Twitter API and Python](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html) - [Exploratory Data Analysis using Flexdashboard](http://rpubs.com/DocOfi/350830) +- [Plotting using Metricsgraphics](http://www.rpubs.com/DocOfi/352947) ## Comprehensive Notes From 3a05c60dd2a70f1466b48ec6f0ee07655416fcb9 Mon Sep 17 00:00:00 2001 From: DocOfi Date: Fri, 23 Mar 2018 15:03:40 +0800 Subject: [PATCH 81/82] adding a leaflet plot example --- ddp.md | 1 + 1 file changed, 1 insertion(+) diff --git a/ddp.md b/ddp.md index c1a411fc..0af67104 100644 --- a/ddp.md +++ b/ddp.md @@ -18,6 +18,7 @@ permalink: /ddp/ - [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) - [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) - [Shinyapps.io: Configuring Application Timeout](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataProd-shinyTimeoutConfig.md) +- [Plotting Natural Disasters](http://www.rpubs.com/DocOfi/367052) ## Comprehensive Notes From c5413d29c8932f488923632477dc76ec40c400d2 Mon Sep 17 00:00:00 2001 From: Adhira <37569680+Adhira-Deogade@users.noreply.github.com> Date: Mon, 25 Mar 2019 01:56:01 -0400 Subject: [PATCH 82/82] Update curated.md Added medium website for ipython notebook --- curated.md | 1 + 1 file changed, 1 insertion(+) diff --git a/curated.md b/curated.md index 613c5f4e..8c806fd8 100644 --- a/curated.md +++ b/curated.md @@ -16,6 +16,7 @@ permalink: /curated/ - [Matrix rotation for image and contour plots in R](http://blog.snap.uaf.edu/2012/06/08/matrix-rotation-for-image-and-contour-plots-in-r/) - [Fig Data: 11 Tips on How to Handle Big Data in R (and 1 Bad Pun)](http://theodi.org/blog/fig-data-11-tips-how-handle-big-data-r-and-1-bad-pun) - [Data from 538](https://github.com/fivethirtyeight/data) +- [Getting started with python notebook](https://medium.com/@adhira_deo/the-environment-for-building-machine-learning-models-a1552116b355) ### Command Line