knf43
diff --git a/‎README.md‎
Lines changed: 93 additions & 0 deletions b/‎README.md‎
Lines changed: 93 additions & 0 deletions
diff --git a/‎README.qmd‎
Lines changed: 6 additions & 3 deletions b/‎README.qmd‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎misc/students/Balihaxi_Merlin/pa2/README.Rmd‎
Lines changed: 82 additions & 0 deletions b/‎misc/students/Balihaxi_Merlin/pa2/README.Rmd‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎misc/students/Balihaxi_Merlin/pa2/README.html‎
Lines changed: 541 additions & 0 deletions b/‎misc/students/Balihaxi_Merlin/pa2/README.html‎
Lines changed: 541 additions & 0 deletions
diff --git a/‎misc/students/Balihaxi_Merlin/pa2/README.md‎
Lines changed: 79 additions & 0 deletions b/‎misc/students/Balihaxi_Merlin/pa2/README.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎misc/students/Balihaxi_Merlin/pa2/README_files/figure-html/Plot 1-1.png‎
40.6 KB b/‎misc/students/Balihaxi_Merlin/pa2/README_files/figure-html/Plot 1-1.png‎
40.6 KB
diff --git a/‎misc/students/Balihaxi_Merlin/pa2/README_files/figure-html/Plot 2-1.png‎
37.8 KB b/‎misc/students/Balihaxi_Merlin/pa2/README_files/figure-html/Plot 2-1.png‎
37.8 KB
diff --git a/‎misc/students/Balihaxi_Merlin/pa2/README_files/figure-html/Plot 3-1.png‎
159 KB b/‎misc/students/Balihaxi_Merlin/pa2/README_files/figure-html/Plot 3-1.png‎
159 KB
diff --git a/‎misc/students/Hsueh_Chun-Chien/pa2/ReadMe.Rmd‎
Lines changed: 42 additions & 0 deletions b/‎misc/students/Hsueh_Chun-Chien/pa2/ReadMe.Rmd‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎misc/students/Hsueh_Chun-Chien/pa2/ReadMe.html‎
Lines changed: 513 additions & 0 deletions b/‎misc/students/Hsueh_Chun-Chien/pa2/ReadMe.html‎
Lines changed: 513 additions & 0 deletions
@@ -13,6 +13,7 @@ Beginning of semester prep:
 
 - [pa1](#assignment-1)
 - [pa2](#assignment-2)
+- [pa3](#assignment-3)
 
 ------------------------------------------------------------------------
 
@@ -223,3 +224,95 @@ variety of different methods. That being said, you should only review
 the work of your classmates **after** the assignment has been turned in.
 
 ------------------------------------------------------------------------
+
+## Assignment 3
+
+**Topics**: Project management, Tidying data, GitHub Pages
+
+### Overview
+
+In this assignment you will create your own RStudio project in which you
+get, tidy, transform and plot data from a publicly available dataset.
+You will host your project in a GitHub repo and create a project
+website.
+
+**Assigned**: Week 5, 02/24  
+**Due**: Monday, 03/03 before 10pm
+
+### Instructions
+
+Choose any data set you want from the `languageR`,
+[untidydata](https://www.jvcasillas.com/untidydata/), or
+[worldlanguages](https://www.jvcasillas.com/worldlanguages/) packages
+(it can be the same one you used last week, but if you prefer something
+different get permission first). To see all the options, run the
+following code in RStudio:
+
+    data(package = "languageR") 
+    data(package = "untidydata")
+    data(package = "worldlangauges")
+
+or check the documentation on the package website (note: you may need to
+install the package first).
+
+#### Setup
+
+1.  Create a new repo from GitHub.com called `pa3` and clone it to your
+    desktop.
+2.  Create a new project for your repo using RStudio.
+3.  Inside your new project, create an RMarkdown document called
+    `index.Rmd` (the default output format should be html).
+
+#### EDA
+
+4.  Load the data set of your choice and get information about its
+    structure (remember all code needs to be inside a knitr code chunk).
+5.  Tidy the data set (every variable gets a column, every observation
+    occupies a single row), if necessary.
+6.  Calculate descriptive statistics of your choice.
+7.  Select two continuous variables and fit a model to the data
+    (bivariate regression).
+8.  Generate a plot that includes a regression line.
+9.  Write up some general *observations* (1-2 paragraphs max)
+
+#### Share
+
+10. Commit your changes and push them to GitHub.
+11. Publish your repo using GitHub Pages.
+12. Update your fork of the `programming_assignments` repo. Next, create
+    a new folder inside **your** dropbox in `programming_assignments`
+    called `pa3`. Include a README.md file with a link to your published
+    pa3 website. Submit a pull request to the master
+    `programming_assignments` repo.
+
+### Evaluation
+
+This is programming assignment 3 of 4. It is worth 10 of the 40 possible
+points. In order to receive full credit you must complete all steps in
+**Setup**, **EDA**, and **Share** detailed above, and follow **all** the
+instructions. Moreover, steps 1-5 in *EDA* **must** be completed in
+separate code chunks, you must comment every step in your code, and you
+**MUST** knit your project before submitting.
+
+| Task                             | Points |
+|:---------------------------------|-------:|
+| Tidy data                        |      2 |
+| Descriptive stats                |    0.5 |
+| Plot data                        |      1 |
+| Fit bivariate regression         |      1 |
+| Publish to GitHub Pages          |      5 |
+| Successfully submit pull request |    0.5 |
+| **Total**                        |     10 |
+
+### Tips
+
+- Review the RStudio Projects tutorial to refresh your memory.
+- Review the recommended readings for tips on tidying your data.
+- Only submit a pull request to `programming_assignments` once
+  everything is working properly in your repo.
+- Always include a README in your repos.
+- Make sure you **look** at the output after knitting. Is it clean? Make
+  it look good (i.e., don’t type everything in bold!).
+- **Use slack to ask questions**
+
+------------------------------------------------------------------------
@@ -36,7 +36,7 @@ pas <- dir(path = "./staging", pattern = "*.md")
 list_elements <- substr(pas, start = 1, stop = max(nchar(pas)) - 3)
 
 # Use pa file names for link references
-link_names <- paste0('[', list_elements[1:2], ']')
+link_names <- paste0('[', list_elements[1:3], ']')
 
 #
 # Get section ref for link
@@ -53,7 +53,7 @@ for (element in 1:length(link_names)) {
 }
 
 # Combine everything into an unordered list
-cat(paste0('- ', link_names[1:2], link_ref[1:2], '\n'))
+cat(paste0('- ', link_names[1:3], link_ref[1:3], '\n'))
 ```
 
 
@@ -72,7 +72,10 @@ cat(paste0('- ', link_names[1:2], link_ref[1:2], '\n'))
 #| eval: true
 ```
 
-```{r, child='./staging/pa3.md', eval=F}
+```{r}
+#| label: pa3
+#| child: './staging/pa3.md' 
+#| eval: true
 ```
 
 ```{r, child='./staging/pa4.md', eval=F}
 
@@ -0,0 +1,82 @@
+---
+title: "Programming assignment 2"
+author: "Merlin Balihaxi"
+date: "Last update: `r Sys.time()`" 
+output:
+  html_document:  
+    highlight: kate  
+    keep_md: yes  
+    theme: united
+---
+
+```{r}
+#| label: Plot 1
+#| message: false
+#| warning: false
+
+# LogFrequency: a numeric vector with log-transformed frequency in Vermeer's frequency dictionary of Dutch children's texts
+# ProportionOfErrors: a numeric vector for the proportion of error responses for the word
+# "lab" were asked from GPT
+library(languageR)
+library(tidyverse)
+beginningReaders |>
+  ggplot() +
+  aes(x = LogFrequency, y = ProportionOfErrors) +
+  geom_point() +
+  labs(
+    title = "Scatterplot for the relation between log(frequency) and proportion of errors",
+    subtitle = "data: beginningReaders",
+    x = "log(frequency)",
+    y = "proportion of errors"
+  )
+```
+
+```{r}
+#| label: Plot 2
+#| message: false
+#| warning: false
+
+# PrevError: factor with levels CORRECT and ERROR coding whether the preceding trial elicited a correct lexical decision
+# LogRT: the dependent variable, log response latency
+# Sex: factor coding the sex of the participant, with levels F (female) and M (male)
+danish |>
+  ggplot()+
+  aes(x = PrevError, y = LogRT, fill = Sex) +
+  geom_boxplot(position = "dodge2") +
+    labs(
+    title = "Boxplot for the relation between PrevError and response latency in Danish",
+    subtitle = "data: danish; grouped by: Sex",
+    x = "PrevError (don't know how to shorten this)",
+    y = "log(response latency)"
+  )+
+  coord_flip()
+```
+
+```{r}
+#| label: Plot 3
+#| message: false
+#| warning: false
+
+# WrittenFrequency: numeric vector with log frequency in the CELEX lexical database
+# Familiarity: numeric vector of subjective familiarity ratings
+# LengthInLetters: numeric vector with length of the word in letters.
+# AgeSubject: a factor with as levels the age group of the subject: young versus old
+# WordCategory: a factor with as levels the word categories N (noun) and V (verb)
+english |>
+  select(wf = WrittenFrequency, fm = Familiarity, age = AgeSubject, len = LengthInLetters, cat = WordCategory) |>
+  filter(len > 2 & len < 7, age == "young") |>
+  ggplot() +
+  aes(x = wf, y = fm, colour = len, position = "jitter") +
+  geom_point(alpha = 0.75) +
+  labs(
+    title = "Scatterplot for the relation between word frequency and familiarity",
+    subtitle = "data=english; grouped by: word category (N, V) & word length",
+    x = "log(word frequency)",
+    y = "familiarity",
+    color = "word length"
+  ) +
+   facet_grid(len ~ cat) +
+  stat_summary(
+    fun.data = mean_sdl, 
+    alpha=0.1, colour = "tomato")
+```
@@ -0,0 +1,79 @@
+---
+title: "Programming assignment 2"
+author: "Merlin Balihaxi"
+date: "Last update: 2025-02-12 23:59:17.55597" 
+output:
+  html_document:  
+    highlight: kate  
+    keep_md: yes  
+    theme: united
+---
+
+
+``` r
+# LogFrequency: a numeric vector with log-transformed frequency in Vermeer's frequency dictionary of Dutch children's texts
+# ProportionOfErrors: a numeric vector for the proportion of error responses for the word
+# "lab" were asked from GPT
+library(languageR)
+library(tidyverse)
+beginningReaders |>
+  ggplot() +
+  aes(x = LogFrequency, y = ProportionOfErrors) +
+  geom_point() +
+  labs(
+    title = "Scatterplot for the relation between log(frequency) and proportion of errors",
+    subtitle = "data: beginningReaders",
+    x = "log(frequency)",
+    y = "proportion of errors"
+  )
+```
+
+![](README_files/figure-html/Plot 1-1.png)<!-- -->
+
+
+``` r
+# PrevError: factor with levels CORRECT and ERROR coding whether the preceding trial elicited a correct lexical decision
+# LogRT: the dependent variable, log response latency
+# Sex: factor coding the sex of the participant, with levels F (female) and M (male)
+danish |>
+  ggplot()+
+  aes(x = PrevError, y = LogRT, fill = Sex) +
+  geom_boxplot(position = "dodge2") +
+    labs(
+    title = "Boxplot for the relation between PrevError and response latency in Danish",
+    subtitle = "data: danish; grouped by: Sex",
+    x = "PrevError (don't know how to shorten this)",
+    y = "log(response latency)"
+  )+
+  coord_flip()
+```
+
+![](README_files/figure-html/Plot 2-1.png)<!-- -->
+
+
+``` r
+# WrittenFrequency: numeric vector with log frequency in the CELEX lexical database
+# Familiarity: numeric vector of subjective familiarity ratings
+# LengthInLetters: numeric vector with length of the word in letters.
+# AgeSubject: a factor with as levels the age group of the subject: young versus old
+# WordCategory: a factor with as levels the word categories N (noun) and V (verb)
+english |>
+  select(wf = WrittenFrequency, fm = Familiarity, age = AgeSubject, len = LengthInLetters, cat = WordCategory) |>
+  filter(len > 2 & len < 7, age == "young") |>
+  ggplot() +
+  aes(x = wf, y = fm, colour = len, position = "jitter") +
+  geom_point(alpha = 0.75) +
+  labs(
+    title = "Scatterplot for the relation between word frequency and familiarity",
+    subtitle = "data=english; grouped by: word category (N, V) & word length",
+    x = "log(word frequency)",
+    y = "familiarity",
+    color = "word length"
+  ) +
+   facet_grid(len ~ cat) +
+  stat_summary(
+    fun.data = mean_sdl, 
+    alpha=0.1, colour = "tomato")
+```
+
+![](README_files/figure-html/Plot 3-1.png)<!-- -->
@@ -0,0 +1,42 @@
+---
+title: "Programming assignment 2"  
+author: "ChunChien Hsueh"  
+date: "Last update: `r Sys.time()`"  
+output:  
+  html_document:  
+    highlight: kate  
+    keep_md: yes  
+    theme: united
+---
+
+```{r}
+library('languageR')
+library(ggplot2)
+```
+```{r}
+#beginningReaders
+# 1. Bivariate scatterplot (using beginningReaders)
+ggplot(beginningReaders, aes(x = Word, y = LogRT)) +
+  geom_point(color = "blue", alpha = 0.6) +
+  labs(title = "Bivariate Scatterplot", x = "Word", y = "LogRT") +
+  theme_minimal()
+```
+```{r}
+#danish
+# 2. Boxplot with different fill colors (using danish)
+ggplot(danish, aes(x = Affix, y = LogRT, fill = Affix)) +
+  geom_boxplot() +
+  labs(title = "Boxplot with Different Fill Colors", x = "Affix", y = "LogRT") +
+  theme_minimal()
+```
+```{r}
+#dativeSimplified
+# 3. Plot with stat_summary and facet (using dativeSimplified)
+ggplot(dativeSimplified, aes(x = Verb, y = LengthOfTheme)) +
+  stat_summary(fun = mean, geom = "point", color = "red", size = 3) +
+  facet_wrap(~ AnimacyOfRec) +
+  labs(title = "Plot with stat_summary and Facet", x = "Verb", y = "LengthOfTheme") +
+  theme_minimal()
+
+```
+