diff --git a/.gitignore b/.gitignore index 058dd6c2..d17e5544 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ _site .DS_Store .Rhistory +.Rproj.user diff --git a/about.md b/about.md index 27bf2391..37ecc9da 100644 --- a/about.md +++ b/about.md @@ -19,8 +19,10 @@ The [Data Science Specialization](https://www.coursera.org/specialization/jhudat - [Kevin Markham](http://www.dataschool.io/) - Derek Franks - David Hood +- [Leonard Greski](https://github.com/lgreski) - Michael Sachs - Allan InocĂȘncio de Souza Costa - [stepds](https://github.com/stepds) - Bastiaan Quast -- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) \ No newline at end of file +- [Xing Su](http://sux13.github.io/DataScienceSpCourseNotes/) +- [Edmund julian Ofilada](https://github.com/DocOfi) diff --git a/capstone.md b/capstone.md new file mode 100644 index 00000000..6285e422 --- /dev/null +++ b/capstone.md @@ -0,0 +1,14 @@ +--- +title: "Capstone" +permalink: /capstone/ +layout: page +--- +## Reference Material + +- [Speech and Language Processing, 3rd Edition](https://web.stanford.edu/~jurafsky/slp3/) Working version of Jurafsky, et. al. book on natural language processing whose content on n-grams is helpful for the capstone. + +## Course Project + +- [n-gram Computations and Computer Capacity](http://bit.ly/2couvxh) Explains the amount of memory required to convert the text files for the course project into n-grams, using the quanteda package. +- [Capstone Strategy](http://bit.ly/2rGcgc6) Describes a general strategy to get through the Capstone: use the simplest approaches possible. +- [Choosing a Text Analysis Package](http://bit.ly/2qagsPa) Reviews pros and cons of various R packages used for natural language processing, in the context of requirements for the Capstone project. diff --git a/curated.md b/curated.md index 33ff6ca7..8c806fd8 100644 --- a/curated.md +++ b/curated.md @@ -16,6 +16,7 @@ permalink: /curated/ - [Matrix rotation for image and contour plots in R](http://blog.snap.uaf.edu/2012/06/08/matrix-rotation-for-image-and-contour-plots-in-r/) - [Fig Data: 11 Tips on How to Handle Big Data in R (and 1 Bad Pun)](http://theodi.org/blog/fig-data-11-tips-how-handle-big-data-r-and-1-bad-pun) - [Data from 538](https://github.com/fivethirtyeight/data) +- [Getting started with python notebook](https://medium.com/@adhira_deo/the-environment-for-building-machine-learning-models-a1552116b355) ### Command Line @@ -61,6 +62,7 @@ permalink: /curated/ ### Reproducible Research - [Markdown live demo](http://markdown-here.com/livedemo.html) - [Boosting Slides by Ron Meir](https://github.com/Aratinga/Misc/blob/master/BoostingTutorial.pdf) +- [Reproducible Research website](http://reproducibleresearch.net/) ### Machine Learning - [UC Irvine Machine Learning Data Repository](http://archive.ics.uci.edu/ml/) diff --git a/ddp.md b/ddp.md index bf9419f0..0af67104 100644 --- a/ddp.md +++ b/ddp.md @@ -5,15 +5,20 @@ permalink: /ddp/ --- - [Slidify to Github walkthrough](http://rpubs.com/thoughtfulbloke/25103) -- [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides/) +- [ggvis and rmarkdown slides with interactive plots](http://qua.st/ggvis-shiny-html5-slides) ## Shiny +- Choropleth of PBS WARN Distribution of Wireless Emergency Alerts + - [Code for Shiny App](https://github.com/amsilvr/shiny_choropleth) + - [App running on shinyapps.ip](https://silverman.shinyapps.io/warn_wea/) - [Shiny app to simulate 401K growth with interactive plots](http://www.mephistosoftware.com/shiny/401k_simulator/) - [Shiny Video Tutorials Playlist on Youtube](http://www.youtube.com/playlist?list=PL6wLL_RojB5xNOhe2OTSd-DPkMLVY9DfB) - [Tutorial on writing Shiny simulation apps](https://github.com/homerhanumat/shinyTutorials) -- [Dockerize a Shiny App](http://www.flaviobarros.net/2015/04/30/dockerizing-a-shiny-app/) -- [Git pushing Shiny Apps with Docker/Dokku](http://www.flaviobarros.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) -- [Share your Shiny Apps with Docker and Kitematic](http://www.flaviobarros.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) +- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) +- [Shinyapps.io: Configuring Application Timeout](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/dataProd-shinyTimeoutConfig.md) +- [Plotting Natural Disasters](http://www.rpubs.com/DocOfi/367052) ## Comprehensive Notes diff --git a/eda.md b/eda.md index 8e179acb..1f56ac70 100644 --- a/eda.md +++ b/eda.md @@ -7,7 +7,9 @@ permalink: /eda/ - [Creating a Kite Graph](http://rpubs.com/thoughtfulbloke/kitegraph) - [Analyzing Top/Green500 Supercomputer Technology Trends](http://github.com/ww44ss/Exascalar-Analysis-) - [Emissions Choropleth Maps](https://github.com/BillSeliger/ExData_Plotting2) -- [Data Analysis using Twitter API and Python](http://blog.impiyush.me/2015/03/data-analysis-using-twitter-api-and.html) +- [Data Analysis using Twitter API and Python](http://blog.impiyush.com/2015/03/data-analysis-using-twitter-api-and.html) +- [Exploratory Data Analysis using Flexdashboard](http://rpubs.com/DocOfi/350830) +- [Plotting using Metricsgraphics](http://www.rpubs.com/DocOfi/352947) ## Comprehensive Notes diff --git a/getclean.md b/getclean.md index e2b8ce28..deeccc56 100644 --- a/getclean.md +++ b/getclean.md @@ -16,6 +16,11 @@ permalink: /getclean/ - [Codebook template that can be used in the Getting and Cleaning Data project](https://gist.github.com/JorisSchut/dbc1fc0402f28cad9b41) - ["Real world" example - reading American Community Survey 2000 PUMS Data:](https://github.com/lgreski/acsexample) Demonstrates how to extract records of a given type from a data file containing multiple record types, and how to use an Excel-based code book to specify arguments for reading a fixed-width file. - [18 Months of CTA advice](https://thoughtfulbloke.wordpress.com/2015/08/31/hello-world) +- [Common Problems: Quiz 1 - Missing Java Runtime](http://bit.ly/2jjtyXM) Explains how to solve the problem of a missing Java Runtime for the question that requires students to process a Microsoft Excel spreadsheet. +- [Strategy for Reading Files & APIs / Quiz 2](http://bit.ly/2e4L5oF) +- [Common Problems: Quiz 2 - sqldf() driver fails to connect](http://bit.ly/2kD2KTY) +- [Tutorial: Downloading Files](http://bit.ly/2iP2suj) Illustrates various ways of downloading files, including binary and text files. +- [Creating dataframes from xml data](https://www.dropbox.com/s/7bbzzp4bwsmfl5y/CreatingDataframesfrom%20XmlFiles.odt?dl=0) ## Comprehensive Notes diff --git a/index.md b/index.md index 6c035e70..761f3e41 100644 --- a/index.md +++ b/index.md @@ -4,7 +4,7 @@ layout: page ## Table of Contents -This is site is meant to serve as a directory for the amazing content the +This site is meant to serve as a directory for the amazing content the community has created around the Data Science Specialization. If you are interested in contributing [click here](https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io#contributing). @@ -17,6 +17,7 @@ interested in contributing [click here](https://github.com/DataScienceSpecializa 7. [Regression Models](/regmod/) 8. [Practical Machine Learning](/pml/) 9. [Developing Data Products](/ddp/) +10. [Capstone](/capstone/) - [Other Resources](/other/) - [Curated Pages](/curated/) diff --git a/other.md b/other.md index a36f6c2f..ddb49135 100644 --- a/other.md +++ b/other.md @@ -11,6 +11,7 @@ permalink: /other/ - [Installing Some Basic R Packages in Ubuntu; Ibrahim El Merehbi](http://elmerehbi.wordpress.com/2014/09/09/installing-some-basic-r-packages-in-ubuntu) - [Using Projects in RStudio](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) - [Using Version Control with RStudio](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN) +- [Using R behind HTTP/HTTPS Proxy](https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy) ### Ignoring R & RStudio files - [gitignore template for R](https://github.com/github/gitignore/blob/master/R.gitignore) (source:[gitignore](https://github.com/github/gitignore)) @@ -25,3 +26,8 @@ permalink: /other/ - [Data Science Toolbox](http://datasciencetoolbox.org/) - A virtual environment that allows you to start doing data science in a matter of minutes. - [Virtual machine with RStudio server and github setup](https://github.com/tboloo/vagrant-rstudio) - A VirtualBox, Vagrant & chef-solo managed virtual machine which provides RStudio server with git & github setup + +## Deploying and sharing Shiny Apps with Docker +- [Dockerize a Shiny App](http://www.rmining.net/2015/04/30/dockerizing-a-shiny-app/) +- [Git pushing Shiny Apps with Docker/Dokku](http://www.rmining.net/2015/05/11/git-pushing-shiny-apps-with-docker-dokku/) +- [Share your Shiny Apps with Docker and Kitematic](http://www.rmining.net/2015/08/10/share-your-shiny-apps-with-docker-and-kitematic/) diff --git a/pml.md b/pml.md index be4defc3..1054002d 100644 --- a/pml.md +++ b/pml.md @@ -22,6 +22,16 @@ permalink: /pml/ - [Comparing Supervised Learning Algorithms](http://www.dataschool.io/comparing-supervised-learning-algorithms/): Comparing 8 common supervised learning algorithms (for regression and classification) on 13 different dimensions. -## Comprehensive Notes +## Content Related to the Lectures - Complete notes for [Practical Machine Learning](http://sux13.github.io/DataScienceSpCourseNotes/) +- [Week 4: Combining Predictors -- Math Explained](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-combiningPredictorsBinomial.md) + +## Configuring Github Pages with RStudio for PML Project + +- Step by step instructions to [Configure Github Pages with RStudio](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-ghPagesSetup.md) to support the PML course project. + +## Improving Runtime Performance of Caret + +- Step by step instructions to [implement parallel processing in caret::train()](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/pml-randomForestPerformance.md) on a random forest model, along with runtime performance analysis for a variety of laptops, ranging from an Intel Atom-based tablet to a quad-core i7 processor. + diff --git a/repres.md b/repres.md index 5fc1ac8e..cba776f9 100644 --- a/repres.md +++ b/repres.md @@ -9,6 +9,7 @@ permalink: /repres/ - [Trends and severity of Data Breaches](http://rpubs.com/ww44ss/29389) - [Benefit-cost analysis of a park user fee](https://rstudio-pubs-static.s3.amazonaws.com/72135_dc45211d976842c2a9a8c8b5f2472ff0.html) - [Data Lake Integrity](http://rpubs.com/rshane/81297) +- [ProjectTemplate in RStudio with Git](http://padamson.github.io/r/rstudio/projecttemplate/git/2016/01/17/projecttemplate-in-rstudio-with-git.html) ## Comprehensive Notes diff --git a/rprog.md b/rprog.md index 42e19df4..47df54d1 100644 --- a/rprog.md +++ b/rprog.md @@ -1,27 +1,57 @@ --- -layout: page -title: R Programming +title: "R Programming" permalink: /rprog/ +layout: page --- +## Getting Started +- [Resources for R Programming](http://bit.ly/2dhZ8Dy) +- [References for R Programming](http://bit.ly/2b8AxhF) +- [Data Science Specialization Value Proposition](http://bit.ly/2j3EcCn) +- [R Onboarding for SAS Users](http://bit.ly/2dr7yum) + ## Programming Assignments +- [Strategy for Coding the Programming Assignments](http://bit.ly/2ddFh9A) - [Tutorial for those struggling with Programming Assignment 1](https://github.com/derekfranks/practice_assignment) +- [Breaking Down pollutantmean](http://bit.ly/2cHyiCl) +- [Assignment 1: A More Elegant Solution](http://bit.ly/2kwBBlK) +- [A SAS Version of pollutantmean?](http://bit.ly/2d3DR4e) - [Tutorial for those struggling with Programming Assignment 2](https://github.com/DanieleP/PA2-clarifying_instructions) - [Tutorial for those struggling with Programming Assignment 3](https://github.com/DanieleP/PA3-tutorial) - [PA1-test: `testthat`, Unit Tests for Programming Assignment 1](https://github.com/cbryant1000/pa1test) - [PA3-test: `testthat`, Unit Tests for Programming Assignment 3](https://github.com/cbryant1000/pa3test) - [Alternative submit script for Programming Assignment 1 that makes submitting more convenient by allowing selection of multiple parts plus prompting if user wants to submit another part before exiting](https://github.com/rchampoux/coursera/blob/master/rprog-scripts-submitscript1.R) +- [Grading the SHA-1 Hash Code](http://bit.ly/2iUWoB6) +- [Assignment 2: Demystifying makeVector](http://bit.ly/2bTXXfq) +- [Assignment 2: makeCacheMatrix as an Object](http://bit.ly/2byUe4e) ## R Language - [Some notes on the R Language](http://lopezrj.github.io) +- [A Data Frame is Also a List](http://bit.ly/2fmMRAp) +- [S Objects, R Objects, and Lexical Scoping](http://bit.ly/2dtOSXi) +- [Common R Mistakes: Overwriting Functions with Data Objects](http://bit.ly/2i3gmoA) +- [Forms of the Extract Operator](http://bit.ly/2bzLYTL) +- [Functions to Sort Data Frames](http://bit.ly/2dxItzw) +- [Creative Use of R: Downloading Course Lectures](http://bit.ly/2bGlI7R) Article illustrating how to use R to automate the download of lectures from *Data Science Specialization* courses, such as *R Programming*. Techniques used in this article are helpful to make research reproducible, as required for courses like *Getting and Cleaning Data* and *Reproducible Research*. +- [Lexical Scoping and Statistical Computing](http://bit.ly/2cmqAPy) Article by Robert Gentleman and Ross Ihaka at the University of Auckland describing how lexical scoping works, and why it is valuable in statistical computing. +- [Data Science Job Report 2017: R Passes SAS, But Python Leaves Them Both Behind](http://bit.ly/2oCHulX) Bob Muenchen's take on the job market for various data science langauges. + + ## R language cheatsheet - [R cheatsheet covering all lectures](https://github.com/startupjing/Tech_Notes/blob/master/R/R_language.md) +## R and Commercial Statistics Packages + +- [R Onboarding for SAS Users](http://bit.ly/2dr7yum) Provides an overview and links to a variety of resources to help people with SAS experience make the transition to R +- [Commercial Statistics Packages: An Historical Perspective](http://bit.ly/2fPj2qN) +- [Why is R More Difficult than SAS?](http://bit.ly/2erxk3A) +- [Thinking in R versus Thinking in SAS](http://bit.ly/2cH3u8x) + ## Comprehensive Notes - Complete notes for [R Programming](http://sux13.github.io/DataScienceSpCourseNotes/) diff --git a/statinf.md b/statinf.md index c5a85435..19592a27 100644 --- a/statinf.md +++ b/statinf.md @@ -5,7 +5,17 @@ permalink: /statinf/ --- - [Why degrees of freedom decrease for sample variance](https://github.com/Manu58/bias/blob/master/bias.pdf) +- [CONCEPTS: Calculating Area for a Point on the Normal Curve](http://bit.ly/2hw5AMF) Reviews the mathematics that explain why one cannot calculate the exact proability for a specific value within a distribution for a continuous variable, and illustrates how to calculate a quantile for a point on the curve. - [Analysis of exponential distribution of births data set from the CDC](https://gist.github.com/ProgramErgoSum/5316008387746fcd84de) +- [Exponential Distribution / Central Limit Theorem - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-expDistChecklist.md) +- [ToothGrowth Analysis - Assignment Checklist](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/ToothGrowthChecklist.md) +- [Exploratory Data Analysis in ToothGrowth Assignment](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/edaInToothGrowthAnalysis.md), explaining the exploratory data analysis requirement for students who have not taken the *Exploratory Data Analysis* course prior to taking *Statistical Inference*. +- [Using MathJax with Discussion Forums, R Markdown, and Github Pages](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/mathjaxWithGithubMarkdown.md) +- [Kable Tables with Data Frames](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/kableDataFrameTable.md) illustrates how to display a custom table in a `knitr()` document by creating a data frame to contain the information to be rendered with `kable()`. +- [Interactive Confidence Interval Visualization](https://github.com/amcadie/interactive_CI) +- [Installing MiKTeK on Windows 10 / Generate a PDF from knitr](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-generatePDF.md) +- [Power calculations: optimal szmple size](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-optimalSampleSize.md) +- [Permutation Tests Explained](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/statinf-permutationTests.md) ## Comprehensive Notes diff --git a/toolbox.md b/toolbox.md index ed0755ac..3c2dfc68 100644 --- a/toolbox.md +++ b/toolbox.md @@ -15,7 +15,12 @@ permalink: /toolbox/ - [Understanding the Relationship Between Git and GitHub](http://www.dataschool.io/github-is-just-dropbox-for-git/) - [Simple Guide to GitHub Forks](http://www.dataschool.io/simple-guide-to-forks-in-github-and-git/) - [Github Repo Tutorial How to fork a repo, download it to your local drive and commit changes ](https://www.youtube.com/watch?v=MY94AIplcaU) +- [Configuring RStudio to work with Git / Github - Mac OSX](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/configureRStudioGitOSXVersion.md) +- [Configuring RStudio to work with Git / Github - Windows](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/configureRStudioGitWindowsVersion.md) ## Comprehensive Notes - Complete notes for [The Data Scientist's Toolbox](http://sux13.github.io/DataScienceSpCourseNotes/) + +## Miscellaneous +- [Using Editor Modes in Coursera Discussion Forum Posts](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/usingMarkdownInForumPosts.md)