Loops: Exercises

Note

These exercises are optional.

# install.packages("tidyverse")
# install.packages("here")

library(tidyverse)
library(here)

## Load the data
characters <- readRDS(file = here::here("raw_data", "characters.rds"))
psych_stats <- read.csv(
  file = here::here("raw_data", "psych_stats.csv"),
  sep = ";"
)

## Reshape into long format:
psych_stats <- psych_stats %>%
  pivot_longer(
    cols = messy_neat:innocent_jaded,
    names_to = "question",
    values_to = "rating"
  )

## Take a look at the data sets
str(characters)
'data.frame':   889 obs. of  7 variables:
 $ id        : chr  "F2" "F1" "F5" "F4" ...
 $ name      : chr  "Monica Geller" "Rachel Green" "Chandler Bing" "Joey Tribbiani" ...
 $ uni_id    : chr  "F" "F" "F" "F" ...
 $ uni_name  : chr  "Friends" "Friends" "Friends" "Friends" ...
 $ notability: num  79.7 76.7 74.4 74.3 72.6 51.6 86.5 84.2 82.6 65.6 ...
 $ link      : chr  "https://openpsychometrics.org/tests/characters/stats/F/2" "https://openpsychometrics.org/tests/characters/stats/F/1" "https://openpsychometrics.org/tests/characters/stats/F/5" "https://openpsychometrics.org/tests/characters/stats/F/4" ...
 $ image_link: chr  "https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg" "https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg" "https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg" "https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg" ...
str(psych_stats)
tibble [323,596 × 3] (S3: tbl_df/tbl/data.frame)
 $ char_id : chr [1:323596] "F2" "F2" "F2" "F2" ...
 $ question: chr [1:323596] "messy_neat" "disorganized_self.disciplined" "diligent_lazy" "on.time_tardy" ...
 $ rating  : num [1:323596] 95.7 95.2 6.1 6.2 6.4 ...
## Merge it
characters_stats <- merge(
  x = characters,
  y = psych_stats,
  by.x = "id",
  by.y = "char_id"
)

Exercise 1

Print each fictional universe (column: uni_name) in the characters_stats data frame into your console once, like this: "The fictional universe 'fictional universe' is part of the characters data set."

for (universe in unique(characters_stats$uni_name)) {
  print(
    paste0(
      "The fictional universe '", 
      universe, 
      "' is part of the characters data set."
      )
    )
}
[1] "The fictional universe 'Arrested Development' is part of the characters data set."
[1] "The fictional universe 'Avatar: The Last Airbender' is part of the characters data set."
[1] "The fictional universe 'Arcane' is part of the characters data set."
[1] "The fictional universe 'Archer' is part of the characters data set."
[1] "The fictional universe 'It's Always Sunny in Philadelphia' is part of the characters data set."
[1] "The fictional universe 'Bones' is part of the characters data set."
[1] "The fictional universe 'Brooklyn Nine-Nine' is part of the characters data set."
[1] "The fictional universe 'Beauty and the Beast' is part of the characters data set."
[1] "The fictional universe 'Breaking Bad' is part of the characters data set."
[1] "The fictional universe 'The Big Bang Theory' is part of the characters data set."
[1] "The fictional universe 'The Breakfast Club' is part of the characters data set."
[1] "The fictional universe 'Broad City' is part of the characters data set."
[1] "The fictional universe 'Bob's Burgers' is part of the characters data set."
[1] "The fictional universe 'Battlestar Galactica' is part of the characters data set."
[1] "The fictional universe 'Buffy the Vampire Slayer' is part of the characters data set."
[1] "The fictional universe 'Community' is part of the characters data set."
[1] "The fictional universe 'Calvin and Hobbes' is part of the characters data set."
[1] "The fictional universe 'Criminal Minds' is part of the characters data set."
[1] "The fictional universe 'Craze Ex-Girlfriend' is part of the characters data set."
[1] "The fictional universe 'Dexter' is part of the characters data set."
...

Note how we don’t have to use i as counter (even though it is convention).

Exercise 2

Remember how we used the group_by() command to calculate the number of gold medals for each country? Well, now you know enough to do something similar without using the tidyverse, by using a for-loop.

  1. Subset a data frame that only contains the characters of one (your favorite) fictional universe.
characters_friends <- characters_stats %>%
  filter(uni_name == "Friends")
  1. Now calculate the mean rating over all characters in this fictional universe for each question and print the result in a statement containing the sentence: "The mean rating for the fictional universe 'your_universe' on the question 'question' is: 'mean_rating'."

Build a for loop that goes over all unique questions (use unique()) in your subsetted data frame. Inside this for-loop you can subset again, this time only the rows containing the question that the loop is at at the moment, and calculate its mean rating from here. Then use paste() to build and print the statement.

for (i in unique(characters_friends$question)) { # goes over all unique questions

  ## Build a subset that only consists of ratings about the current question:
  question_dat <- characters_friends %>%
    filter(question == i)

  ## Calculate the mean for that subset:
  question_mean <- mean(question_dat$rating)

  ## Build and print the final statement:
  statement <- paste("The mean rating for the fictional universe 'Friends' on the question '", i, "' is:", question_mean)
  print(statement)
}
[1] "The mean rating for the fictional universe 'Friends' on the question ' messy_neat ' is: 47.6833333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' disorganized_self.disciplined ' is: 45.1666666666667"
[1] "The mean rating for the fictional universe 'Friends' on the question ' diligent_lazy ' is: 42.2333333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' on.time_tardy ' is: 53.3166666666667"
[1] "The mean rating for the fictional universe 'Friends' on the question ' competitive_cooperative ' is: 33.6"
[1] "The mean rating for the fictional universe 'Friends' on the question ' scheduled_spontaneous ' is: 55.5333333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' ADHD_OCD ' is: 40.7833333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' chaotic_orderly ' is: 41.6666666666667"
[1] "The mean rating for the fictional universe 'Friends' on the question ' motivated_unmotivated ' is: 32.2833333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' bossy_meek ' is: 41.4666666666667"
[1] "The mean rating for the fictional universe 'Friends' on the question ' persistent_quitter ' is: 29.05"
[1] "The mean rating for the fictional universe 'Friends' on the question ' overachiever_underachiever ' is: 41.85"
[1] "The mean rating for the fictional universe 'Friends' on the question ' muddy_washed ' is: 64.2166666666667"
[1] "The mean rating for the fictional universe 'Friends' on the question ' beautiful_ugly ' is: 19.5"
[1] "The mean rating for the fictional universe 'Friends' on the question ' slacker_workaholic ' is: 53.5333333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' driven_unambitious ' is: 34.1833333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' outlaw_sheriff ' is: 50.2666666666667"
[1] "The mean rating for the fictional universe 'Friends' on the question ' precise_vague ' is: 51.55"
[1] "The mean rating for the fictional universe 'Friends' on the question ' bad.cook_good.cook ' is: 37.6333333333333"
[1] "The mean rating for the fictional universe 'Friends' on the question ' manicured_scruffy ' is: 32.4166666666667"
...
  1. Tweak your for loop so the mean values get saved in a new data frame, containing the question and the mean rating for each question.
  1. Build an empty data frame where you will save your results.
  2. Now you can’t easily loop over the question column itself, because you need the position of each element to save it in the respective row of your new data frame: for(i in 1:length(unique(characters_friends$question))){.
  3. Now you can save the result of your calculation in row i and column mean of your new data frame.
## Build an empty data frame for storing the results:
mean_ratings <- data.frame()

for (i in 1:length(unique(characters_friends$question))) {
  ## Extract the question on position i:
  question_i <- unique(characters_friends$question)[i]

  ## Extract all rows that contain values for this question:
  question_dat <- characters_friends %>%
    filter(question == question_i)

  ## Calculate the mean for that question
  question_mean <- mean(question_dat$rating)

  ## Save the question in the row corresponding to the position of i:
  mean_ratings[i, "question"] <- question_i

  ## Save the mean in the row corresponding to the position of i:
  mean_ratings[i, "mean"] <- question_mean
}

head(mean_ratings)
                       question     mean
1                    messy_neat 47.68333
2 disorganized_self.disciplined 45.16667
3                 diligent_lazy 42.23333
4                 on.time_tardy 53.31667
5       competitive_cooperative 33.60000
6         scheduled_spontaneous 55.53333

Let’s compare that with group_by()

characters_friends %>%
  group_by(question) %>%
  summarise(mean_rating = mean(rating)) %>%
  ## Let's look at the rating of this question for comparison:
  filter(question == "messy_neat")
# A tibble: 1 × 2
  question   mean_rating
  <chr>            <dbl>
1 messy_neat        47.7

Great, its the same!