Skip to content

Commit 7d857b6

Browse files
committed
🔨 update README
1 parent b224f20 commit 7d857b6

File tree

7 files changed

+74
-31
lines changed

7 files changed

+74
-31
lines changed

README.md

Lines changed: 6 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,12 @@ Many kudos to [Dr Chuanxin Liu](https://github.com/codetrainee), my former PhD s
5050
+ [Introduction to hypergeometric, geometric, negative binomial and multinomial distributions](https://github.com/erikaduan/R_tips/blob/master/tutorials/2020-09-22_hypergeometric-and-other-discrete-distributions/2020-09-22_hypergeometric-and-other-discrete-distributions.md)
5151

5252

53+
# Other resources
54+
These resources also cover a comprehensive range of practical R usage tutorials.
55+
56+
+ [Statistical Computing](https://36-750.github.io/) by Alex Reinhart and Christopher Genovese
57+
+ [Data Science Toolkit](https://benkeser.github.io/info550/lectures/) by David Benkeser
58+
5359
# Tutorial style guide
5460

5561
A painful form of technical debt is inconsistent code style. This repository now contains the following file naming and code style rules.
@@ -81,31 +87,7 @@ A painful form of technical debt is inconsistent code style. This repository now
8187
version 1.4.0.
8288
https://CRAN.R-project.org/package=stringr
8389

84-
+ Max Kuhn. (2019). `caret`: Classification and Regression
85-
Training. R package version 6.0-84. https://CRAN.R-project.org/package=caret
86-
+ Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony
87-
Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem,
88-
Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt.
89-
90-
+ Jacob Kaplan (2020). `fastDummies`: Fast Creation of Dummy (Binary) Columns and Rows from Categorical
91-
Variables. R package version 1.6.1. https://CRAN.R-project.org/package=fastDummies
92-
9390
+ Kirill Müller (2017). `here`: A Simpler Way to Find Your Files. R package version 0.1.
9491
https://CRAN.R-project.org/package=here
9592

96-
+ Paul Murrell (2015). `compare`: Comparing Objects for Differences. R package version 0.2-6.
97-
https://CRAN.R-project.org/package=compare
98-
99-
+ A. Liaw and M. Wiener (2002). Classification and Regression by `randomForest`. R News 2(3), 18--22.
100-
101-
+ Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory
102-
Mitchell, Ignacio Cano, Tianyi Zhou, Mu Li, Junyuan Xie, Min Lin, Yifeng Geng and Yutian Li (2020).
103-
`xgboost`: Extreme Gradient Boosting. R package version 1.0.0.2. https://CRAN.R-project.org/package=xgboost
104-
105-
+ Alexandros Karatzoglou, Alex Smola, Kurt Hornik, Achim Zeileis (2004). `kernlab` - An S4 Package for Kernel
106-
Methods in R. Journal of Statistical Software 11(9), 1-20. URL http://www.jstatsoft.org/v11/i09/
107-
108-
+ Microsoft Corporation and Steve Weston (2019). `doParallel`: Foreach Parallel Adaptor for the `parallel`
109-
Package. R package version 1.0.15. https://CRAN.R-project.org/package=doParallel
110-
11193
+ Richard Iannone (2020). `DiagrammeR`: Graph/Network Visualization. R package version 1.0.6.1. https://CRAN.R-project.org/package=DiagrammeR

tutorials/dc-data_table_vs_dplyr/dc-data_table_vs_dplyr.Rmd

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,27 @@ output:
77
toc: true
88
---
99

10-
```{r setup, include = FALSE}
11-
knitr::opts_chunk$set(echo = TRUE, results = 'hide', message = FALSE)
10+
```{r setup, include=FALSE}
11+
# Set up global environment ----------------------------------------------------
12+
knitr::opts_chunk$set(echo=TRUE, results="hide", message=FALSE)
1213
```
1314

14-
```{r, message = FALSE}
15-
#-----load required packages-----
15+
```{r, message=FALSE}
16+
# Load required packages -------------------------------------------------------
1617
if (!require("pacman")) install.packages("pacman")
1718
pacman::p_load(here,
18-
ids, # for generating random ids
19+
ids, # Generate random ids
1920
tidyverse,
2021
data.table,
21-
compare, # compare between data frames
22+
compare, # Compare between data frames
2223
microbenchmark)
2324
```
2425

2526

2627
# Introduction
2728

28-
One of the great benefits of following Rstats conversations on Twitter is its access to user insights. I became curious about `data.table` after reading conversations about its superior performance yet decreased visibility compared to `tidyverse`.
29+
I became curious about `data.table` after reading Twitter conversations about its superior performance yet decreased visibility compared to `tidyverse`. Because
30+
2931

3032
Fast forward a few years and the [data processing efficiency](https://h2oai.github.io/db-benchmark/) of `data.table` has become extremely handy:
3133

@@ -960,6 +962,8 @@ In contrast, `data.table` is efficient because it contains a very fast ordering
960962

961963
# Other resources
962964

965+
+ https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
966+
963967
+ The definitive [stack overflow discussion](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly/27840349#27840349) about the best use cases for data.table versus dplyr (from tidyverse).
964968

965969
+ A great side by side comparison of data.table versus dplyr operations by [Atrebas](https://atrebas.github.io/post/2019-03-03-datatable-dplyr/).
@@ -974,3 +978,7 @@ In contrast, `data.table` is efficient because it contains a very fast ordering
974978
Robin Lovelace](https://csgillespie.github.io/efficientR/data-processing-with-data-table.html).
975979

976980
+ A more detailed explanation of the usage of binary search based subset in `data.table` by [Arun Srinivasan](https://gist.github.com/arunsrinivasan/dacb9d1cac301de8d9ff).
981+
982+
+ https://bookdown.org/rdpeng/rprogdatascience/parallel-computation.html
983+
984+
+ http://www.john-ros.com/Rcourse/parallel.html

tutorials/dc-data_table_vs_dplyr/dc-dataset_generation_script.R

Whitespace-only changes.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: "Using arrow with tidyverse and data.table"
3+
author: Erika Duan
4+
date: "`r Sys.Date()`"
5+
output:
6+
github_document:
7+
toc: true
8+
---
9+
10+
11+
```{r setup, include=FALSE}
12+
knitr::opts_chunk$set(echo = TRUE)
13+
```
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
Using arrow with tidyverse and data.table
2+
================
3+
Erika Duan
4+
2022-03-05
5+
6+
- [R Markdown](#r-markdown)
7+
- [Including Plots](#including-plots)
8+
9+
## R Markdown
10+
11+
This is an R Markdown document. Markdown is a simple formatting syntax
12+
for authoring HTML, PDF, and MS Word documents. For more details on
13+
using R Markdown see <http://rmarkdown.rstudio.com>.
14+
15+
When you click the **Knit** button a document will be generated that
16+
includes both content as well as the output of any embedded R code
17+
chunks within the document. You can embed an R code chunk like this:
18+
19+
``` r
20+
summary(cars)
21+
```
22+
23+
## speed dist
24+
## Min. : 4.0 Min. : 2.00
25+
## 1st Qu.:12.0 1st Qu.: 26.00
26+
## Median :15.0 Median : 36.00
27+
## Mean :15.4 Mean : 42.98
28+
## 3rd Qu.:19.0 3rd Qu.: 56.00
29+
## Max. :25.0 Max. :120.00
30+
31+
## Including Plots
32+
33+
You can also embed plots, for example:
34+
35+
![](dc-using_arrow_files/figure-gfm/pressure-1.png)<!-- -->
36+
37+
Note that the `echo = FALSE` parameter was added to the code chunk to
38+
prevent printing of the R code that generated the plot.
3.66 KB
Loading

tutorials/p-automating_rmd_reports/p-automating_rmd_reports_part_2.Rmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -314,3 +314,5 @@ jobs:
314314
+ A [YouTube tutorial](https://www.youtube.com/watch?v=NwUijrm2U2w) by DVC on using GitHub Actions with R to automate data visualisation tasks.
315315
+ A useful (online resource](https://explainshell.com/) for explaining shell commands required to create components of the GitHub Actions YAML workflow.
316316
+ https://amitlevinson.com/blog/automated-plot-with-github-actions/
317+
+ https://rstats.wtf/index.html
318+
+ https://goodresearch.dev/pipelines.html

0 commit comments

Comments
 (0)