Dependency Management in R

This page covers more specific information regarding dependency management in R including legacy projects like {packrat}, alternative community-driven projects like {groundhog}, and introduces the {renv} package.

The goal of package managers in R are to ensure that the packages you need are downloaded from a defined repository and installed in the libraries you want to use. There are several package managers available for R, each with its own strengths and weaknesses. The three most known package managers are {packrat}, {groundhog}, and {renv}.

Common Managers

{packrat}

The {packrat} package was one of the first package managers developed for R, and it was designed to address some of the shortcomings of the base R package management system. {packrat} allows you to create a “snapshot” of your package dependencies, which can be shared with collaborators or used to recreate your working environment at a later date. This can be useful if you are working on a project with multiple collaborators or if you need to reproduce your analysis at a later date.

It was, for many years, the most popular environment/dependency manager package for R, and was developed and maintained by RStudio/Posit. However, it has been soft-deprecated by Posit in favor of the {renv} package since at least April 2020. You should not start new projects with {packrat}, and, if possible, migrate existing projects using {packrat} over to {renv}. That said, it is worth knowing about the existence of this package because old tutorials, projects, and repos may still be using it.

{groundhog}

{groundhog} is a community-driven package manager for R that was developed as an alternative to {packrat} and {renv}. It is designed to be lightweight and easy to use, and it has already gained a following among R users who are looking for an alternative to {packrat}, notably within psychological research and related fields. The example from their website is included below:

install.packages("groundhog")
library("groundhog")
pkgs <- c("rio","metafor")
groundhog.library(pkgs, "2023-09-01")

This example demonstrates a scenario in which you would like to install the rio and metafor packages as they were on September 1st, 2023. The groundhog.library() function is used to install these packages, and the date is specified as the second argument. One of the main benefits of this approach is that swapping the library() call for groundhog.library() is a simple change to make in your scripts, and it can be used to ensure that your collaborators are using the same package versions as you are.

The other main benefit of using {groundhog} is that it is community-driven, meaning that it is developed and maintained by a group of R users who are interested in improving the package management experience in R outside of the Posit ecosystem. However, there are also some potential drawbacks to using {groundhog}.

The package is still in relatively early stages of development so it may not be as stable or reliable as other package managers like {renv}. Likewise, the package does not have as many features as other package managers, so it will only be suitable for a smaller number of use cases. One of the largest benefits of learning {renv} instead of {groundhog} is a result of a networking effect; {renv} is a transferable skill between academia and industry, and it is likely to be more widely used than {groundhog} especially because of the professional support and software development the package receives from Posit.

{renv}

{renv} is another one of Posit’s inventions, and is designed to be lightweight, easy to use, and to work well with other Posit packages and CI/CD development pipelines. A separate tutorial will discuss CI/CD pipelines in detail, but, in short, using {renv} with your R projects allows you to easily automate advanced tasks like publishing Shiny apps, deploying website updates through GitHub Pages, checking the development status of R packages you create, and more. These benefits may seem a bit less tangible if you are new to R, but they are very important both for professional software development and for tasks that can make your work significantly easier in the long run.

More to the focus of this tutorial, however, {renv} greatly assists in making your project reproducible by creating a “snapshot” of your package dependencies in as few as two commands:

renv::init()
renv::snapshot()

The results of this code will be discussed in the next chapter, but, in short, the “snapshot” it creates can be shared with collaborators or used to recreate your working environment at a later date. This is referred to as restoring a project, and can be done with the aptly-named renv::restore() function.

Pros and Cons

As mentioned, {renv} is likely the most popular option for package management in R, and is a highly transferable skill between both academia and industry settings1. Thus, it is the one that I recommend using for new projects, and will be the focus of the rest of this tutorial on dependency management in R.

Moreover, this package can be considered a relatively stable package as it has moved beyond it’s initial development stages into a version 1 release (currently at version 1.0.11). It is maintained by professional software engineers at Posit, and also receives open-source contributions and bug reports via the public GitHub repository.

In comparison to {groundhog}, one of the supposed main drawbacks of {renv} is that it is project-based, meaning that you need to create an R Project (.RProj file) to use {renv} whereas {groundhog} allows you to simply embed the function groundhog.library(pkg, date) into any R file. This is not a drawback in my opinion, as using an R Project structure helps manage your working environment, but it is worth noting that the default workflow for {renv} is not as flexible as {groundhog} in this regard. However, {renv} also offers a similar workflow to {groundhog} via the embed() and use() functions, discussed in the supplementary material of this tutorial.

Limitations of Package Managers

The main limitation of package managers is that they only manage the packages you use in your project. They do not manage the software you use to run your project, such as R itself, or the operating system you use to run the software. This means that if you need to reproduce your analysis at a later date, you will also need to ensure that you have the correct version of R and any other software you need to run your project.

Matching the installed version of R to the required version, however, will tend to be the main problem in most cases. The potential solutions are too expansive to cover in this tutorial, but the most common solution is to use a software that allows you to change R versions like rig for R projects on your computer, or to use a containerization solution like Binder for creating Docker files or managed environments in RStudio Cloud. These solutions allow you to create a “container” with instructions to download all the software you need to run your project, and can be shared with collaborators or used to recreate your working environment at a later date.

Important

R package managers only manage the packages you use in your project. The package managers might record the version of R you are using, but they will not directly manage or change your R versions.

If you need to change R versions for your projects, we recommend downloading rig.

A less common issue, particularly if you are using Windows or macOS, occurs when R packages have a system dependency that is neither R itself nor other R packages. These are also not managed by package managers, but can easily identified either by checking a package’s DESCRIPTION file or by checking out the Posit Package Manager page for the package. This site is particularly convenient as it lists all the system dependencies for a package, and the commands needed to install them on a variety of operating systems.

For a more advanced understanding of system dependencies in R packages, see this R-hub blogpost on the topic. Subsequently, you can explore the rstudio/r-system-requirements repo to see how RStudio manages system dependencies for their own packages (and you can too if you need to use e.g. a Docker container for your project).

For example, the {curl} R package requires the curl system software which comes bundled with Windows and macOS, but needs to be installed explicitly by the user on Linux systems. The PPM page for {curl} lists the commands needed to install the curl software on a variety of Linux distributions, and can be found here.

Back to top

Footnotes

  1. Niche: {renv} also has integrations allowing it to be used with Python which can be useful in instances where you have a multilanguage project. This would likely only be encountered in complex, multiteam analysis projects or possibly in industry settings.↩︎