Functions

1

Note

This chapter is optional.

library(tidyverse)
world_coordinates <- readRDS(file = here::here("raw_data", "world_coordinates.rds"))
athletes <- readRDS(file = here::here("raw_data", "athletes.rds"))

Motivation

Suppose we want to know the number of gold medals a specific athlete has won, along with some additional data, all printed into the console. Well, we could do something like this:

medal_counts_athlete <- athletes %>%
  # Extract all rows containing gold medal winners:
  filter(Medal %in% c("Gold")) %>%
  # Group them by name:
  group_by(Name) %>%
  # Count the number of medals for each name:
  count(Medal) 

head(medal_counts_athlete)
# A tibble: 6 × 3
# Groups:   Name [6]
  Name                                   Medal     n
  <chr>                                  <chr> <int>
1 "A. Albert"                            Gold      1
2 "Aage Jrgen Christian Andersen"        Gold      1
3 "Aage Valdemar Harald Frandsen"        Gold      1
4 "Aagje \"Ada\" Kok (-van der Linden)"  Gold      1
5 "Aale Maria Tynni (-Pirinen, -Haavio)" Gold      1
6 "Aaron Nguimbat"                       Gold      1
# Extract all rows of Usain Bolt
medals_bolt <- medal_counts_athlete %>% 
  filter(Name == "Usain St. Leo Bolt")

head(medals_bolt)
# A tibble: 1 × 3
# Groups:   Name [1]
  Name               Medal     n
  <chr>              <chr> <int>
1 Usain St. Leo Bolt Gold      8
# Extract all rows of Usain bolt from the athletes data set
stats_bolt <- athletes %>%
  filter(Name == "Usain St. Leo Bolt") %>%
  ## sort the data frame by year:
  arrange(Year)

head(stats_bolt)
  NOC    ID               Name Sex Age Height Weight    Team       Games Year
1 JAM 13029 Usain St. Leo Bolt   M  17    196     95 Jamaica 2004 Summer 2004
2 JAM 13029 Usain St. Leo Bolt   M  21    196     95 Jamaica 2008 Summer 2008
3 JAM 13029 Usain St. Leo Bolt   M  21    196     95 Jamaica 2008 Summer 2008
4 JAM 13029 Usain St. Leo Bolt   M  21    196     95 Jamaica 2008 Summer 2008
5 JAM 13029 Usain St. Leo Bolt   M  25    196     95 Jamaica 2012 Summer 2012
6 JAM 13029 Usain St. Leo Bolt   M  25    196     95 Jamaica 2012 Summer 2012
  Season    City     Sport                                Event Medal  Region
1 Summer  Athina Athletics           Athletics Men's 200 metres  <NA> Jamaica
2 Summer Beijing Athletics Athletics Men's 4 x 100 metres Relay  <NA> Jamaica
3 Summer Beijing Athletics           Athletics Men's 200 metres  Gold Jamaica
4 Summer Beijing Athletics           Athletics Men's 100 metres  Gold Jamaica
5 Summer  London Athletics Athletics Men's 4 x 100 metres Relay  Gold Jamaica
6 Summer  London Athletics           Athletics Men's 100 metres  Gold Jamaica
# Print a statement using the data we just have extracted: 
print(
  paste("Usain St. Leo Bolt participated in Olympic games in the year(s)",
        paste0(unique(stats_bolt$Year), collapse = ", "), 
        "and won", 
        medals_bolt$n, 
        "Goldmedal/s in total. The athletes sport was:", 
        unique(stats_bolt$Sport), 
        ".")
  )
[1] "Usain St. Leo Bolt participated in Olympic games in the year(s) 2004, 2008, 2012, 2016 and won 8 Goldmedal/s in total. The athletes sport was: Athletics ."

Puuh, already not that quick, especially if this is meant as an easy way for users to extract the gold medal number for multiple athletes. They would have to specify for both data frames the name and build together their print statement from scratch. Luckily, we can just write a function which is a way to organize multiple operations together, so they can easily get repeated. Let’s do that quickly, and then take a step back and look at the components of a function:

count_goldmedals <- function(athlete_name) {
  medal_counts_athlete <- athletes %>%
    ## Extract all rows with gold medal winners:
    filter(Medal == "Gold") %>%
    ## Group them by name
    group_by(Name) %>%
    ## count the number of medals for each name:
    count(Medal)

  ## Extract the medal count row for the athlete name provided by the user using the athlete_name argument:
  medals_name <- medal_counts_athlete %>%
    filter(Name == athlete_name)

  ## Extract the rows in the athlets data frame for the athlete name provided by the user using the athlete_name argument
  stats_name <- athletes %>%
    filter(Name == athlete_name) %>%
    ## Sort by year:
    arrange(Year)

  ## Build the statement:
  statement <- paste(
    athlete_name,
    "participated in Olympic games in the year(s)",
    paste0(unique(stats_name$Year), collapse = ", "),
    "and won",
    medals_name$n,
    "Goldmedal/s in total. The athletes sport was:",
    unique(stats_name$Sport),
    "."
  )

  print(statement)

  return(medals_name)
}

count_goldmedals(athlete_name = "Usain St. Leo Bolt")
[1] "Usain St. Leo Bolt participated in Olympic games in the year(s) 2004, 2008, 2012, 2016 and won 8 Goldmedal/s in total. The athletes sport was: Athletics ."
# A tibble: 1 × 3
# Groups:   Name [1]
  Name               Medal     n
  <chr>              <chr> <int>
1 Usain St. Leo Bolt Gold      8
count_goldmedals(athlete_name = "Simone Arianne Biles")
[1] "Simone Arianne Biles participated in Olympic games in the year(s) 2016 and won 4 Goldmedal/s in total. The athletes sport was: Gymnastics ."
# A tibble: 1 × 3
# Groups:   Name [1]
  Name                 Medal     n
  <chr>                <chr> <int>
1 Simone Arianne Biles Gold      4

Pretty cool, right? We just write our code once, and can reuse it as often as we want to. So, let’s take a closer look at how to actually do that.

How to write a function?

Everything that does something in R is a function. We have already used a lot of them, like print(), filter(), merge(). The great thing is: we can define our own functions pretty easily:

function_name <- function(argument_1, argument_2, ...){
  do some operations
  
  return(result)
}
  1. We always have to give the function a concise name (often not that easy).
  2. Then we specify some arguments (which should also have concise names). In our introductory example that was just the athlete name. We can also provide a default option for the arguments, which the function will fall back on if the user doesn’t specify anything.
  3. Inside the { } we define the operations, which can use the variable function arguments so the user can specify some aspects of the function behavior.
  4. In the end, it is good practice to return the result by using return(), so it is always clear what the function is giving back to the user.

One minimal example with three arguments would be to sum three numbers:

sum_num <- function(x, y, z = 0){
  result <- x + y + z
  return(result)
}

sum_num(x = 1, y = 1, z = 2)
[1] 4
## We don't have to use the arguments in order, IF we name them:
sum_num(y = 2, z = 4, x = 1)
[1] 7
## We don't have to specify z, because the function can use a default:
sum_num(x = 3, y = 1)
[1] 4
Tip

It often makes sense to explicitly write the argument names into your function call. This makes your code clearer, and avoids a mix up.

Footnotes

  1. Image by Laura Ockel on Unsplash.↩︎