You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/p-automating_rmd_reports/p-automating_rmd_reports_part_2.Rmd
+9-3Lines changed: 9 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -29,22 +29,28 @@ This tutorial follows from [an earlier one](https://github.com/erikaduan/r_tips/
29
29
30
30
Creating an automated reporting workflow requires the following setup:
31
31
32
-
1. A consistent file structure to store code, data and analytical outputs.
32
+
1. A consistently named and reproducible file structure to store code, library dependencies, data and analytical outputs.
33
33
2. A data ingestion and data cleaning script that can be automatically refreshed.
34
34
3. An R Markdown template report that uses yaml parameters instead of hard coded variables.
35
35
4. A report automation script for all parameters of interest.
36
36
5. (Optional) A CI/CD pipeline using GitHub Actions, which is packaged inside a separate GitHub repository.
37
37
38
38
39
-
# Step 1: Create a consistent project structure
39
+
# Step 1: Create a project environment
40
+
41
+
## Use consistent names
40
42
41
43
There is no best way to organise your project structure. I recommend starting with a simple naming structure that everyone easily understands. For this tutorial, I have created a separate GitHub repository named [`abs_labour_force_report`](https://github.com/erikaduan/abs_labour_force_report) which contains the folders `code` to store my R scripts and Rmd documents, `data` to store my data, and `output` to store my analytical outputs.
**Note:** The `data` folder contains subfolders `raw_data` and `clean_data` to maintain separation between the raw versus cleaned dataset used for further analysis.
49
+
**Note:** The `data` folder contains subfolders `raw_data` and `clean_data` to maintain separation between the raw (read only) and clean dataset used for further analysis.
50
+
51
+
## Environment reproducibility
52
+
53
+
Besides your code, data inputs and outputs, a reproducible virtual environment also needs to be created to support your project workflow. In R, a simple way of managing project specific R package dependencies is to use the `renv` package.
48
54
49
55
50
56
# Step 2: Create data ingestion and data cleaning R script
0 commit comments