13 Functional programming: making and using your own functions
Why on earth would you create your own function?
It can be useful to make your own function
13.1 Set up
We will use the tidyverse
and purrr
in this chapter.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.6 ✔ dplyr 1.0.8
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
13.2 Defining simple functions
Like your data, a function is an object
that is defined.
Let’s say you wanted to take whatever value you had and add one to it.
We could define a function called add_one
to do this:
add_one <- function(x) {
x + 1
}
There are four elements to what we just did:
- Created a new function using the
function()
function. - Defined one argument (input) to the new function:
x
- Gave the function instructions to ‘take
x
and add one’:x + 1
. - Assigned this function to the object
add_one
.
Now that you’ve done all that, you can call the function like you would any other:
add_one(1)
## [1] 2
add_one(2)
## [1] 3
add_one(100)
## [1] 101
You can define functions can take more than one argument:
add_together_plus_one <- function(x, y, z) {
x + y + z + 1
}
add_together_plus_one(1, 2, 3)
## [1] 7
As the above takes three arguments, and there are no defaults provided, we’ll get an error if we fotget one:
add_together_plus_one(1, 2)
## Error in add_together_plus_one(1, 2): argument "z" is missing, with no default
So you can also provide default values to your function when you define it:
make_power <- function(x, n = 2) {
x^n
}
If you don’t provide a value for n
when you call the make_power
function,
it will default to 2
:
make_power(10)
## [1] 100
And you can override that by providing your own value:
make_power(10, 4)
## [1] 10000
We can also assign the result of this function (10000
above) to
an object of its own:
my_power_number <- make_power(10, 4)
For this reason, and a few others described in a section to follow, a function can only return one thing (a number, or a vector, or dataset, or list, and so on).
13.3 Using conditional statements and categorical arguments
Sometimes you will want your function to behave differently under different circumstances. For example, you might want to do one thing if your input is a REALLY BIG number, and another if it’s very small.
Conditional if
statements – which evaluate an expression and proceed if TRUE
– can be useful for these occasions.
The function below takes one argument – x
– and transforms it depending on how large it is:
make_smaller <- function(x) {
if (x > 10) {
return_number <- x - 10
}
if (x <= 10) {
return_number <- x - 5
}
return_number
}
If the input number is greater than 10, the make_smaller
function will take
10
off it; if it’s 10 or less, it will just take 5
off:
make_smaller(7)
## [1] 2
make_smaller(13)
## [1] 3
Making use of the return()
function could make this clearer.
If x
is greater than 10
, the make_smaller
function will now subtract 10
and immediately return the value, ignoring everything else below it:
make_smaller <- function(x) {
if (x > 10) {
return(x - 10)
}
if (x <= 10) {
return(x - 5)
}
}
make_smaller(7)
## [1] 2
make_smaller(13)
## [1] 3
These conditional statements can be used for input options to your functions.
Let’s say you had a vector of ages of people in your office, office_ages
:
office_ages <- c(-6, 12, 21, 36, 56, 67, 200)
office_ages
## [1] -6 12 21 36 56 67 200
To summarise your office by age, you might want a function that would round each age to the nearest 10
. You could make a function that rounds a number to the nearest 10
,
using the round()
function with digits
set to -1
(i.e. round to the nearest 10
):
make_age10 <- function(age) {
round(age, digits = -1)
}
make_age10(office_ages)
## [1] -10 10 20 40 60 70 200
Perfect. But some of those ages look implausible, and you might also want your function to validate them, by, say, capping ages to be between zero and 100
. You could let validate_ages
be an argument, defaulting to TRUE
, and if it is TRUE
, then you could perform the validation:
make_age10 <- function(age,
validate_ages = TRUE) {
# First, validate ages IF ASKED FOR
if (validate_ages) {
age <- if_else(age > 100, 100, age)
age <- if_else(age < 0, 0, age)
}
# Then round ages to the nearest 10:
round(age, digits = -1)
}
Now, if validate_ages == TRUE
(the default), the numbers over 100
will be replaced with 100
, and those less than 0
with 0
:
make_age10(office_ages)
## [1] 0 10 20 40 60 70 100
And you can turn that behaviour off by setting validate_ages
to FALSE
:
make_age10(office_ages, validate_ages = FALSE)
## [1] -10 10 20 40 60 70 200
However, we might not trust an age entry of -6
or 200
at all, and might want to give the user the option to remove them rather than assume they are 0
or 100
. We could provide that option in one of two ways.
The first is to add another argument, shown below.
make_age10 <- function(age,
validate_ages = TRUE,
remove_implausible = FALSE) {
# First, validate ages
if (validate_ages) {
if (remove_implausible) {
# Replace implausible ages with NAs
age <- if_else(age > 100 | age < 0, NA_real_, age)
}
if (!remove_implausible) {
# Replace implausible ages with their nearest plausible age
age <- if_else(age > 100, 100, age)
age <- if_else(age < 0, 0, age)
}
}
# Then round ages to the nearest 10:
round(age, digits = -1)
}
This is fine! By default, the function still works like it did previously:
make_age10(office_ages, validate_ages = TRUE)
## [1] 0 10 20 40 60 70 100
And if the user wants to validate ages and they want to remove the
implausible ages, then they will be replaced with NA
:49
make_age10(office_ages, validate_ages = TRUE, remove_implausible = TRUE)
## [1] NA 10 20 40 60 70 NA
But those nested conditional statements – which are often
needed! – can be a bit of a headache.
There is another way we can do this, which might be neater in this instance.
The validate_ages
argument could take a character string that tells it which
validation method to use. We could use "remove"
and "adjust"
to indicate the
validation methods:
make_age10 <- function(age,
validate_ages = "remove") {
# First, validate ages
if (validate_ages == "remove") {
age <- if_else(age > 100 | age < 0, NA_real_, age)
}
if (validate_ages == "adjust") {
# Replace implausible ages with their nearest plausible age
age <- if_else(age > 100, 100, age)
age <- if_else(age < 0, 0, age)
}
# Then round ages to the nearest 10:
round(age, digits = -1)
}
Now we can use the validate_ages
argument by itself to get the results we want:
make_age10(office_ages, validate_ages = "adjust")
## [1] 0 10 20 40 60 70 100
make_age10(office_ages, validate_ages = "remove")
## [1] NA 10 20 40 60 70 NA
13.4 What is ‘returned’ from a function?
A function can do lots of things in the background. For example, you might want to take a vector, square every number, and then add all those numbers up:
sum_squares <- function(x) {
# first, add one to each using the function we defined above
added <- add_one(x)
# then sum all the numbers in the vector
summed <- sum(added)
# then return the summed object
summed
}
Running that function on a vector of numbers \(1, 2, 3, ..., 10\) (created with 1:10
) does what we want:
sum_squares(1:10)
## [1] 65
But look in your Environment window. The two objects that were created in the
function – added
and summed
– aren’t there! They are instead calculated,
stored in the background, and removed when the function is finished.
A function only returns one thing; everything else that is created in it is discarded.50 This behaviour keeps your environment clean and tidy, but it can cause some frustration when you’re getting started.
That one thing that is returned is – by default – the last thing printed in
the function. this is a bad explanation that I need to make better
For sum_squares
above, we defined two objects and then passed the summed
to
the end of the function. If we omitted the last step, the function wouldn’t return anything:
empty_sum_squares <- function(x) {
# first, add one to each using the function we defined above
added <- add_one(x)
# then sum all the numbers in the vector
summed <- sum(added)
}
empty_sum_squares(1:10)
The empty_sum_squares
function took the 1:10
vector, then added one, then
summed the resulting numbers. But it didn’t return anything.
It just assigned values to the added
and summed
objects, then the function
finished and those objects vanished.
The return()
function can help you make this behaviour clear. Using return()
will stop your function in its tracks and pass the object out of the function.
We can use it in the sum_squares
function:
sum_squares <- function(x) {
# first, add one to each using the function we defined above
added <- add_one(x)
# then sum all the numbers in the vector
summed <- sum(added)
# then return the summed object
return(summed) # function stops here!
}
Ensuring your function returns the object you want in the form you want is the second step in writing your own functions.
13.5 Using functions on datasets within mutate()
Recall that when you are adding (or changing) a variable in a dataset with mutate
, the length of the output must be either the length of the dataset or one (which is then repeated). Anything else will throw an error:
office_df <- tibble(office_ages)
office_df
## # A tibble: 7 × 1
## office_ages
## <dbl>
## 1 -6
## 2 12
## 3 21
## 4 36
## 5 56
## 6 67
## 7 200
## # A tibble: 7 × 2
## office_ages fave_colour
## <dbl> <chr>
## 1 -6 #F68B33
## 2 12 #F68B33
## 3 21 #F68B33
## 4 36 #F68B33
## 5 56 #F68B33
## 6 67 #F68B33
## 7 200 #F68B33
# A vector of length of the dataset is fine:
office_df %>%
mutate(fave_number = c(10, 4, 1, 4, 0, 99, 100))
## # A tibble: 7 × 2
## office_ages fave_number
## <dbl> <dbl>
## 1 -6 10
## 2 12 4
## 3 21 1
## 4 36 4
## 5 56 0
## 6 67 99
## 7 200 100
# BUT a vector of a different length will cause an error:
office_df %>%
mutate(fave_lunch = c("pasta salad", "a variety"))
## Error in `mutate()`:
## ! Problem while computing `fave_lunch = c("pasta salad", "a variety")`.
## ✖ `fave_lunch` must be size 7 or 1, not 2.
In the section above, we created the make_age10
function and applied it to our little vector of ages. The function took a vector and returned a vector of the same length:
## [1] 7
## [1] 7
13.6 Using the purrr
family of functions
Section to be finished; see https://github.com/grattan/R_at_Grattan/issues/59