6 Packages commonly used at Grattan

Some packages we use at Grattan - like the tidyverse collection of packages - are very popular among R users. Some - like the grattantheme package - are specific to Grattan Institute. Others - like the readabs package - are made by Grattan people, useful at Grattan, but also used outside of the Institute. To install a core set of packages we use at Grattan, click here and run the code chunk.

6.1 The tidyverse!

The tidyverse is central to our work at Grattan. The tidyverse is a collection of related R packages for importing, wrangling, exploring and visualising data in R. The packages are designed to work well together. The main packages in the tidyverse include:

ggplot2 for making beautiful, customisable graphs
dplyr for manipulating data frames
tidyr for tidying your data
readr for importing data from a broad range of formats
purrr for functional programming
stringr for manipulating strings of text

All these packages (and more!) will automatically be loaded for you when you run the command⁸:

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

A range of other packages are installed on your machine as part of the tidyverse. These include:

readxl for importing Excel spreadsheets into R
haven for importing Stata, SAS and SPSS data
lubridate for working with dates
rvest for scraping websites

Although these packages are installed as part of the tidyverse, they aren’t loaded automatically when you run library(tidyverse). You’ll need to load them individually, like:

library(lubridate)
library(readxl)

6.1.1 Why do we use the tidyverse?

The tidyverse makes life easier!

The core tidyverse packages, like ggplot2, dplyr, and tidyr, are extremely popular. The tidyverse is probably the most popular ‘dialect’ of R. This means that any problem you encounter with the tidyverse will have been encountered many times before by other R users, so a solution will only be a Google search away.

The tidyverse packages are all designed to work well together, with a consistent underlying philosophy and design. This means that coding habits you learn with one tidyverse package, like dplyr, are also applicable to other packages, like tidyr.

They’re designed to work with data frames⁹, a rectangular data object that will be familiar to spreadsheet users that is very intuitive and convenient for the sort of work we do at Grattan. In particular, the tidyverse is built around the concept of tidy data, which has a specific meaning in this context that we’ll come to later. The fact that tidyverse packages are all built around one type of data object makes them easier to work with.

The creator of the tidyverse, Hadley Wickham, places great value on code that is expressive and comprehensible to humans. This means that code written in the tidyverse idiom is often able to be understood even if you haven’t encountered the functions before. For example, look at this chunk of code:

my_data %>%
  filter(age >= 30) %>%
  mutate(relative_income = income / mean(income))

Without knowing what my_data looks like, and even if you haven’t encountered these functions before, this should be reasonably intuitive. We’re taking some data, and then¹⁰ only keeping observations that relate to people aged 30 and older, then calculating a new variable, relative_income. The name of a tidyverse function - like filter, group_by, summarise, and so on - generally gives you a pretty good idea what the function is going to do with your data, which isn’t always the case with other approaches.

Here’s one way to do the same thing in base R:

transform(my_data[my_data$age >= 30, ],
          relative_income = income / mean(income))

The base R code gets the job done, but it’s clunkier, less expressive, and harder to read. A core principle of coding at Grattan is that you should strive to make your work human readable.

Code written with tidyverse functions is often faster than its base R equivalents. But most of our work at Grattan is with small-to-medium sized datasets (with fewer than a million rows or so), so speed isn’t usually a major concern anyway.¹¹

The most valuable resource we deal with at Grattan is our time. Computers are cheap, people are not. If your code executes quickly, but it takes your colleague many hours to decipher it, the cost of the extra QC time more than outweighs the saving through faster computation. The tidyverse packages strike a balance between expressive, comprehensible code and computational efficiency that suits the nature of our work at Grattan. This balance is the right one for most of our work, most of the time.

Most R scripts at Grattan should start with library(tidyverse). Most of your work will be in data frames, and most of the time the tidyverse contains the core tools you’ll need to do that work.

6.2 Grattan-specific packages

A range of Grattan people have written packages that come in handy at Grattan. * grattantheme The grattantheme package, by Matt Cowgill and Will Mackey, helps to make your ggplot2 charts Grattan-y. We cover the package extensively in the data visualisation chapter.

grattandata The grattandata package, by Matt Cowgill and Jonathan Nolan, is used to load microdata from the Grattan microdata repository. We cover this in the reading data chapter.
grattan The grattan package, created by Hugh Parsonage, contains two broad sets of functions. One set of functions (sometimes known by the nickname “Grattax”) is used for modelling the personal income tax system. Another set of functions (“Grattools”) are useful for a lot of our work, like converting dates to financial years (grattan::date2fy()) or a version of dplyr::ntile() that uses weights (grattan::weighted_ntile()).

6.3 Other commonly-used packages

There are other packages we commonly use at Grattan, including some developed by Grattan staff. These include:

absmapsdata This package, by Will Mackey, is very handy for working with spatial data. You’ll want it if you’re going to be making maps.
readabs The readabs package, by Matt Cowgill, provides an easy way to download, tidy, and import ABS time series data in R.

5 What are packages?

7 Getting help with R