6 Packages commonly used at Grattan
Some packages we use at Grattan - like the tidyverse
collection of packages - are very popular among R users. Some - like the grattantheme
package - are specific to Grattan Institute. Others - like the readabs
package - are made by Grattan people, useful at Grattan, but also used outside of the Institute. To install a core set of packages we use at Grattan, click here and run the code chunk.
6.1 The tidyverse!
The tidyverse
is central to our work at Grattan. The tidyverse
is a collection of related R packages for importing, wrangling, exploring and visualising data in R. The packages are designed to work well together.
The main packages in the tidyverse
include:
- ggplot2 for making beautiful, customisable graphs
- dplyr for manipulating data frames
- tidyr for tidying your data
- readr for importing data from a broad range of formats
- purrr for functional programming
- stringr for manipulating strings of text
All these packages (and more!) will automatically be loaded for you when you run the command8:
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
A range of other packages are installed on your machine as part of the tidyverse.
These include:
- readxl for importing Excel spreadsheets into R
- haven for importing Stata, SAS and SPSS data
- lubridate for working with dates
- rvest for scraping websites
Although these packages are installed as part of the tidyverse
, they aren’t loaded automatically when you run library(tidyverse)
. You’ll need to load them individually, like:
6.1.1 Why do we use the tidyverse?
The tidyverse
makes life easier!
The core tidyverse
packages, like ggplot2
, dplyr
, and tidyr
, are extremely popular. The tidyverse
is probably the most popular ‘dialect’ of R. This means that any problem you encounter with the tidyverse
will have been encountered many times before by other R users, so a solution will only be a Google search away.
The tidyverse
packages are all designed to work well together, with a consistent underlying philosophy and design. This means that coding habits you learn with one tidyverse
package, like dplyr
, are also applicable to other packages, like tidyr
.
They’re designed to work with data frames9, a rectangular data object that will be familiar to spreadsheet users that is very intuitive and convenient for the sort of work we do at Grattan. In particular, the tidyverse
is built around the concept of tidy data, which has a specific meaning in this context that we’ll come to later. The fact that tidyverse
packages are all built around one type of data object makes them easier to work with.
The creator of the tidyverse
, Hadley Wickham, places great value on code that is expressive and comprehensible to humans. This means that code written in the tidyverse
idiom is often able to be understood even if you haven’t encountered the functions before. For example, look at this chunk of code:
Without knowing what my_data
looks like, and even if you haven’t encountered these functions before, this should be reasonably intuitive. We’re taking some data, and then10 only keeping observations that relate to people aged 30 and older, then calculating a new variable, relative_income
. The name of a tidyverse
function - like filter
, group_by
, summarise
, and so on - generally gives you a pretty good idea what the function is going to do with your data, which isn’t always the case with other approaches.
Here’s one way to do the same thing in base R:
The base R code gets the job done, but it’s clunkier, less expressive, and harder to read. A core principle of coding at Grattan is that you should strive to make your work human readable.
Code written with tidyverse
functions is often faster than its base R equivalents. But most of our work at Grattan is with small-to-medium sized datasets (with fewer than a million rows or so), so speed isn’t usually a major concern anyway.11
The most valuable resource we deal with at Grattan is our time. Computers are cheap, people are not. If your code executes quickly, but it takes your colleague many hours to decipher it, the cost of the extra QC time more than outweighs the saving through faster computation. The tidyverse
packages strike a balance between expressive, comprehensible code and computational efficiency that suits the nature of our work at Grattan. This balance is the right one for most of our work, most of the time.
Most R scripts at Grattan should start with library(tidyverse)
. Most of your work will be in data frames, and most of the time the tidyverse
contains the core tools you’ll need to do that work.
6.2 Grattan-specific packages
A range of Grattan people have written packages that come in handy at Grattan.
* grattantheme The grattantheme
package, by Matt Cowgill and Will Mackey, helps to make your ggplot2 charts Grattan-y. We cover the package extensively in the data visualisation chapter.
grattandata The
grattandata
package, by Matt Cowgill and Jonathan Nolan, is used to load microdata from the Grattan microdata repository. We cover this in the reading data chapter.grattan The
grattan
package, created by Hugh Parsonage, contains two broad sets of functions. One set of functions (sometimes known by the nickname “Grattax”) is used for modelling the personal income tax system. Another set of functions (“Grattools”) are useful for a lot of our work, like converting dates to financial years (grattan::date2fy()
) or a version ofdplyr::ntile()
that uses weights (grattan::weighted_ntile()
).
6.3 Other commonly-used packages
There are other packages we commonly use at Grattan, including some developed by Grattan staff. These include:
absmapsdata This package, by Will Mackey, is very handy for working with spatial data. You’ll want it if you’re going to be making maps.
readabs The
readabs
package, by Matt Cowgill, provides an easy way to download, tidy, and import ABS time series data in R.