Skip to content

Commit 2feedee

Browse files
2 parents 97565c5 + 89d0b4e commit 2feedee

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+6520
-3
lines changed

README.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Beginning of semester prep:
1313

1414
- [pa1](#assignment-1)
1515
- [pa2](#assignment-2)
16+
- [pa3](#assignment-3)
1617

1718
------------------------------------------------------------------------
1819

@@ -223,3 +224,95 @@ variety of different methods. That being said, you should only review
223224
the work of your classmates **after** the assignment has been turned in.
224225

225226
------------------------------------------------------------------------
227+
228+
## Assignment 3
229+
230+
**Topics**: Project management, Tidying data, GitHub Pages
231+
232+
### Overview
233+
234+
In this assignment you will create your own RStudio project in which you
235+
get, tidy, transform and plot data from a publicly available dataset.
236+
You will host your project in a GitHub repo and create a project
237+
website.
238+
239+
**Assigned**: Week 5, 02/24
240+
**Due**: Monday, 03/03 before 10pm
241+
242+
### Instructions
243+
244+
Choose any data set you want from the `languageR`,
245+
[untidydata](https://www.jvcasillas.com/untidydata/), or
246+
[worldlanguages](https://www.jvcasillas.com/worldlanguages/) packages
247+
(it can be the same one you used last week, but if you prefer something
248+
different get permission first). To see all the options, run the
249+
following code in RStudio:
250+
251+
data(package = "languageR")
252+
data(package = "untidydata")
253+
data(package = "worldlangauges")
254+
255+
or check the documentation on the package website (note: you may need to
256+
install the package first).
257+
258+
#### Setup
259+
260+
1. Create a new repo from GitHub.com called `pa3` and clone it to your
261+
desktop.
262+
2. Create a new project for your repo using RStudio.
263+
3. Inside your new project, create an RMarkdown document called
264+
`index.Rmd` (the default output format should be html).
265+
266+
#### EDA
267+
268+
4. Load the data set of your choice and get information about its
269+
structure (remember all code needs to be inside a knitr code chunk).
270+
5. Tidy the data set (every variable gets a column, every observation
271+
occupies a single row), if necessary.
272+
6. Calculate descriptive statistics of your choice.
273+
7. Select two continuous variables and fit a model to the data
274+
(bivariate regression).
275+
8. Generate a plot that includes a regression line.
276+
9. Write up some general *observations* (1-2 paragraphs max)
277+
278+
#### Share
279+
280+
10. Commit your changes and push them to GitHub.
281+
11. Publish your repo using GitHub Pages.
282+
12. Update your fork of the `programming_assignments` repo. Next, create
283+
a new folder inside **your** dropbox in `programming_assignments`
284+
called `pa3`. Include a README.md file with a link to your published
285+
pa3 website. Submit a pull request to the master
286+
`programming_assignments` repo.
287+
288+
### Evaluation
289+
290+
This is programming assignment 3 of 4. It is worth 10 of the 40 possible
291+
points. In order to receive full credit you must complete all steps in
292+
**Setup**, **EDA**, and **Share** detailed above, and follow **all** the
293+
instructions. Moreover, steps 1-5 in *EDA* **must** be completed in
294+
separate code chunks, you must comment every step in your code, and you
295+
**MUST** knit your project before submitting.
296+
297+
| Task | Points |
298+
|:---------------------------------|-------:|
299+
| Tidy data | 2 |
300+
| Descriptive stats | 0.5 |
301+
| Plot data | 1 |
302+
| Fit bivariate regression | 1 |
303+
| Publish to GitHub Pages | 5 |
304+
| Successfully submit pull request | 0.5 |
305+
| **Total** | 10 |
306+
307+
### Tips
308+
309+
- Review the RStudio Projects tutorial to refresh your memory.
310+
- Review the recommended readings for tips on tidying your data.
311+
- Only submit a pull request to `programming_assignments` once
312+
everything is working properly in your repo.
313+
- Always include a README in your repos.
314+
- Make sure you **look** at the output after knitting. Is it clean? Make
315+
it look good (i.e., don’t type everything in bold!).
316+
- **Use slack to ask questions**
317+
318+
------------------------------------------------------------------------

README.qmd

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ pas <- dir(path = "./staging", pattern = "*.md")
3636
list_elements <- substr(pas, start = 1, stop = max(nchar(pas)) - 3)
3737
3838
# Use pa file names for link references
39-
link_names <- paste0('[', list_elements[1:2], ']')
39+
link_names <- paste0('[', list_elements[1:3], ']')
4040
4141
#
4242
# Get section ref for link
@@ -53,7 +53,7 @@ for (element in 1:length(link_names)) {
5353
}
5454
5555
# Combine everything into an unordered list
56-
cat(paste0('- ', link_names[1:2], link_ref[1:2], '\n'))
56+
cat(paste0('- ', link_names[1:3], link_ref[1:3], '\n'))
5757
```
5858

5959

@@ -72,7 +72,10 @@ cat(paste0('- ', link_names[1:2], link_ref[1:2], '\n'))
7272
#| eval: true
7373
```
7474

75-
```{r, child='./staging/pa3.md', eval=F}
75+
```{r}
76+
#| label: pa3
77+
#| child: './staging/pa3.md'
78+
#| eval: true
7679
```
7780

7881
```{r, child='./staging/pa4.md', eval=F}
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: "Programming assignment 2"
3+
author: "Merlin Balihaxi"
4+
date: "Last update: `r Sys.time()`"
5+
output:
6+
html_document:
7+
highlight: kate
8+
keep_md: yes
9+
theme: united
10+
---
11+
12+
```{r}
13+
#| label: Plot 1
14+
#| message: false
15+
#| warning: false
16+
17+
# LogFrequency: a numeric vector with log-transformed frequency in Vermeer's frequency dictionary of Dutch children's texts
18+
# ProportionOfErrors: a numeric vector for the proportion of error responses for the word
19+
# "lab" were asked from GPT
20+
library(languageR)
21+
library(tidyverse)
22+
beginningReaders |>
23+
ggplot() +
24+
aes(x = LogFrequency, y = ProportionOfErrors) +
25+
geom_point() +
26+
labs(
27+
title = "Scatterplot for the relation between log(frequency) and proportion of errors",
28+
subtitle = "data: beginningReaders",
29+
x = "log(frequency)",
30+
y = "proportion of errors"
31+
)
32+
```
33+
34+
```{r}
35+
#| label: Plot 2
36+
#| message: false
37+
#| warning: false
38+
39+
# PrevError: factor with levels CORRECT and ERROR coding whether the preceding trial elicited a correct lexical decision
40+
# LogRT: the dependent variable, log response latency
41+
# Sex: factor coding the sex of the participant, with levels F (female) and M (male)
42+
danish |>
43+
ggplot()+
44+
aes(x = PrevError, y = LogRT, fill = Sex) +
45+
geom_boxplot(position = "dodge2") +
46+
labs(
47+
title = "Boxplot for the relation between PrevError and response latency in Danish",
48+
subtitle = "data: danish; grouped by: Sex",
49+
x = "PrevError (don't know how to shorten this)",
50+
y = "log(response latency)"
51+
)+
52+
coord_flip()
53+
```
54+
55+
```{r}
56+
#| label: Plot 3
57+
#| message: false
58+
#| warning: false
59+
60+
# WrittenFrequency: numeric vector with log frequency in the CELEX lexical database
61+
# Familiarity: numeric vector of subjective familiarity ratings
62+
# LengthInLetters: numeric vector with length of the word in letters.
63+
# AgeSubject: a factor with as levels the age group of the subject: young versus old
64+
# WordCategory: a factor with as levels the word categories N (noun) and V (verb)
65+
english |>
66+
select(wf = WrittenFrequency, fm = Familiarity, age = AgeSubject, len = LengthInLetters, cat = WordCategory) |>
67+
filter(len > 2 & len < 7, age == "young") |>
68+
ggplot() +
69+
aes(x = wf, y = fm, colour = len, position = "jitter") +
70+
geom_point(alpha = 0.75) +
71+
labs(
72+
title = "Scatterplot for the relation between word frequency and familiarity",
73+
subtitle = "data=english; grouped by: word category (N, V) & word length",
74+
x = "log(word frequency)",
75+
y = "familiarity",
76+
color = "word length"
77+
) +
78+
facet_grid(len ~ cat) +
79+
stat_summary(
80+
fun.data = mean_sdl,
81+
alpha=0.1, colour = "tomato")
82+
```

misc/students/Balihaxi_Merlin/pa2/README.html

Lines changed: 541 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: "Programming assignment 2"
3+
author: "Merlin Balihaxi"
4+
date: "Last update: 2025-02-12 23:59:17.55597"
5+
output:
6+
html_document:
7+
highlight: kate
8+
keep_md: yes
9+
theme: united
10+
---
11+
12+
13+
``` r
14+
# LogFrequency: a numeric vector with log-transformed frequency in Vermeer's frequency dictionary of Dutch children's texts
15+
# ProportionOfErrors: a numeric vector for the proportion of error responses for the word
16+
# "lab" were asked from GPT
17+
library(languageR)
18+
library(tidyverse)
19+
beginningReaders |>
20+
ggplot() +
21+
aes(x = LogFrequency, y = ProportionOfErrors) +
22+
geom_point() +
23+
labs(
24+
title = "Scatterplot for the relation between log(frequency) and proportion of errors",
25+
subtitle = "data: beginningReaders",
26+
x = "log(frequency)",
27+
y = "proportion of errors"
28+
)
29+
```
30+
31+
![](README_files/figure-html/Plot 1-1.png)<!-- -->
32+
33+
34+
``` r
35+
# PrevError: factor with levels CORRECT and ERROR coding whether the preceding trial elicited a correct lexical decision
36+
# LogRT: the dependent variable, log response latency
37+
# Sex: factor coding the sex of the participant, with levels F (female) and M (male)
38+
danish |>
39+
ggplot()+
40+
aes(x = PrevError, y = LogRT, fill = Sex) +
41+
geom_boxplot(position = "dodge2") +
42+
labs(
43+
title = "Boxplot for the relation between PrevError and response latency in Danish",
44+
subtitle = "data: danish; grouped by: Sex",
45+
x = "PrevError (don't know how to shorten this)",
46+
y = "log(response latency)"
47+
)+
48+
coord_flip()
49+
```
50+
51+
![](README_files/figure-html/Plot 2-1.png)<!-- -->
52+
53+
54+
``` r
55+
# WrittenFrequency: numeric vector with log frequency in the CELEX lexical database
56+
# Familiarity: numeric vector of subjective familiarity ratings
57+
# LengthInLetters: numeric vector with length of the word in letters.
58+
# AgeSubject: a factor with as levels the age group of the subject: young versus old
59+
# WordCategory: a factor with as levels the word categories N (noun) and V (verb)
60+
english |>
61+
select(wf = WrittenFrequency, fm = Familiarity, age = AgeSubject, len = LengthInLetters, cat = WordCategory) |>
62+
filter(len > 2 & len < 7, age == "young") |>
63+
ggplot() +
64+
aes(x = wf, y = fm, colour = len, position = "jitter") +
65+
geom_point(alpha = 0.75) +
66+
labs(
67+
title = "Scatterplot for the relation between word frequency and familiarity",
68+
subtitle = "data=english; grouped by: word category (N, V) & word length",
69+
x = "log(word frequency)",
70+
y = "familiarity",
71+
color = "word length"
72+
) +
73+
facet_grid(len ~ cat) +
74+
stat_summary(
75+
fun.data = mean_sdl,
76+
alpha=0.1, colour = "tomato")
77+
```
78+
79+
![](README_files/figure-html/Plot 3-1.png)<!-- -->
40.6 KB
Loading
37.8 KB
Loading
159 KB
Loading
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: "Programming assignment 2"
3+
author: "ChunChien Hsueh"
4+
date: "Last update: `r Sys.time()`"
5+
output:
6+
html_document:
7+
highlight: kate
8+
keep_md: yes
9+
theme: united
10+
---
11+
12+
```{r}
13+
library('languageR')
14+
library(ggplot2)
15+
```
16+
```{r}
17+
#beginningReaders
18+
# 1. Bivariate scatterplot (using beginningReaders)
19+
ggplot(beginningReaders, aes(x = Word, y = LogRT)) +
20+
geom_point(color = "blue", alpha = 0.6) +
21+
labs(title = "Bivariate Scatterplot", x = "Word", y = "LogRT") +
22+
theme_minimal()
23+
```
24+
```{r}
25+
#danish
26+
# 2. Boxplot with different fill colors (using danish)
27+
ggplot(danish, aes(x = Affix, y = LogRT, fill = Affix)) +
28+
geom_boxplot() +
29+
labs(title = "Boxplot with Different Fill Colors", x = "Affix", y = "LogRT") +
30+
theme_minimal()
31+
```
32+
```{r}
33+
#dativeSimplified
34+
# 3. Plot with stat_summary and facet (using dativeSimplified)
35+
ggplot(dativeSimplified, aes(x = Verb, y = LengthOfTheme)) +
36+
stat_summary(fun = mean, geom = "point", color = "red", size = 3) +
37+
facet_wrap(~ AnimacyOfRec) +
38+
labs(title = "Plot with stat_summary and Facet", x = "Verb", y = "LengthOfTheme") +
39+
theme_minimal()
40+
41+
```
42+

misc/students/Hsueh_Chun-Chien/pa2/ReadMe.html

Lines changed: 513 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)