Technical Definitions

Prior to discussing the various package managers in R, it is important to understand some R terminology relevant to dependency management, namely the concepts of and differences between packages, libraries, and repositories.

Packages

Packages are collections of functions, data, and documentation that are bundled together in a single unit. They are the primary way that R users can extend the functionality of the base R language. For example, the {dplyr} package is a popular package for data manipulation that extends the base R language with functions like filter() and mutate(). The code below demonstrates how one can filter a data set using only R’s built-in capabilities and by using the popular data-cleaning package {dplyr}.

Code
iris[iris$Species == "setosa" & iris$Sepal.Length > 4.5, ]
dplyr::filter(iris, Species == "setosa", Sepal.Length > 4.5)

Base R already has a large number of built-in functions and data structures that make it possible to work with data so what is the benefit in creating a whole new package like {dplyr} to accomplish the same tasks?

Subjectively, base R can be difficult to use for complex data manipulation tasks, the syntax is verbose, and it is overall difficult for beginners and intermediate users to gain mastery. Packages like {dplyr}, however, provide a more user-friendly interface for working with data, and they can make complex data manipulation tasks much easier to accomplish for users who are more concerned with accomplishing a task rather than learning a programming language in depth.

This example is not meant to highlight the benefits of {dplyr} specifically, but rather to illustrate the general utility of packages in R. Packages are, above all, a way to extend base R to make life easier for users, they provide alternatives to the base R packages, and they provide excellent extensions to the base R language for specific tasks.

Pre-Installed Packages

Upon installing R, you will have access to a number of pre-installed packages. These packages are part of the “base” R distribution and are considered essential for working with R. A list of such packages can be identified by searching your installed packages for the “base” and “recommended” packages (also summarized as “high” priority packages). The code below generates this list of pre-installed packages in R, along with some information about each package.

Code
installed.packages(priority = c("base", "recommended")) %>%
  as_tibble() %>%
  select(Package, Version, Priority, Depends, Imports) %>%
  DT::datatable(options = list(paging=TRUE, scrollY = '300px', pageLength = 11), style = "bootstrap5")

You will likely recognize some of these packages, such as the {stats} package, which contains many of the commonly used statistical functions that are built into R like glm(). Other packages, like {utils} and {survival}, contain functions for other common activities like reading in files and for performing survival analysis, respectively.

Somewhat confusingly, the base R distribution also includes a package called {base} containing many functions and data structures that are considered essential for working with R. Some references to “base R” are therefore referring to the base package, while others refer to the base distribution of the R language. This distinction is not particularly important for the purposes of this tutorial, but it is worth keeping in mind when reading other R documentation.

User-Installed Packages

In addition to the pre-installed packages, you can also install additional packages from online repositories like CRAN, Bioconductor, and GitHub. You have likely already done this in the past by using the install.packages() function to download from CRAN and/or Bioconductor or devtools::install_github() to get a package from GitHub. Beyond that, there are principally no differences between the pre-installed packages and user-installed packages; they are both just collections of functions, data, and documentation in a defined format.

Packages, of course, need to be stored somewhere on your computer, and this is where the concept of libraries comes in.

(Side note: you may hear people refer to R packages as libraries, likely stemming from the fact that packages are attached to your R session using the library() function. While this generally will not lead to much confusion, it is important to keep in mind the distinction between packages and libraries for the purposes of this tutorial.)

Libraries

Libraries are the location(s) on your computer1 where R packages are stored. When you install a package, the contents are downloaded from a repository and stored in a library on your computer which is simply a directory/folder. Note that we said “a library,” not “the library!” This is because it is possible to have multiple libraries on your computer.

R will look for packages in a set of default library locations, typically one in the user’s home directory and one in the system directory. These paths vary based on operating system, but the .libPaths() function can be used to see the libraries that are currently detected by default on your computer.

# Example of running .libPaths() on a Mac. Results will vary across systems
.libPaths()
#> [1] "/Users/<USER>/Library/R/arm64/4.4/library"                      
#> [2] "/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library"

There are typically 1-2 default libraries on your system, and these system libraries are shared across all of your R sessions and projects. However, you can also create project libraries that are specific to a single project or session. This is useful if you want to ensure that a project uses a specific version of a package.

For example, different versions of the same package {pkg} might have important differences in how the function pkg::foo() works. By storing packages in separate libraries, R can be set up to use the version of foo() you actually want to use. These differences between versions happen more often than you’d expect, and should be logged in the package’s NEWS file or in the package’s documentation. Check out this example of “breaking changes” from the {ggplot2} 3.5.0 package release to get a sense of how behavior of functions can change across versions.

Of course, managing multiple libraries would be incredibly cumbersome, and it requires one to actively be aware of which libraries hold the correct versions of packages. Doing this manually would likely be a worse solution than simply dealing with broken code, but this is where package managers come in; they can help you manage your libraries and package versions effectively, automatically, and without having to be an expert on navigating the file system of your computer. More specifically, package managers will create and manage project libraries in order to ensure that the correct version of a package is used in a given project.

Repositories

Popular R repositories: Comprehensive R Archive Network (CRAN) and Bioconductor

A repository is a collection of packages that are available for download from a specific location. The most common and general-purpose repository for R packages is the Comprehensive R Archive Network (CRAN), which hosted 21797 freely available packages at the last time this page was generated on December 15 2024. Another popular repository, Bioconductor, also hosts several thousand packages, but these packages are focused on bioinformatics, genomics, and computational biology.

Although not strictly “repositories,” GitHub and GitLab to a lesser extent are popular locations for hosting R packages, and many developers use these services to develop their packages and distribute them to the public. Another repository option of note is R-universe, which is a growing repository for R packages that is designed to make it easier for R users to share and discover packages. The benefit (but also downside of R-Universe) is that it is not as curated as CRAN, so you may find packages that are not as well-tested or maintained as those on CRAN.

Commonly Used Repositories

CRAN and Bioconductor

As mentioned, CRAN is the most popular repository for R packages, and it is the default repository that R will use when you use the install.packages() function. CRAN has a strict set of guidelines for package submission, and packages are reviewed by a team of volunteers before they are accepted into the repository. This ensures that packages on CRAN are of high quality and are well-documented, which is why CRAN is often considered the gold standard for R package repositories.

Bioconductor, on the other hand, is a repository specifically for packages related to bioinformatics, genomics, and computational biology. Bioconductor has its own set of guidelines for package submission, and packages are reviewed by a team of experts in the field before they are accepted into the repository. This ensures that packages on Bioconductor are also of high quality and are well-suited for use in bioinformatics and genomics research.

GitHub and GitLab

GitHub and GitLab are popular version control platforms that are used by many R package developers to host their packages. These platforms are not strictly “repositories” in the same sense as CRAN and Bioconductor which are purpose-dedicated to hosting R packages, but they are often used as repositories for R packages. GitHub, in particular, is a popular platform for hosting R packages, and many R packages are available on GitHub for download and installation using the devtools::install_github() function.

R-universe

R-universe is a newer repository for R packages that is designed to make it easier for R users to share and discover packages. R-universe is not as curated as CRAN, so you may find packages that are not as well-tested or maintained as those on CRAN. However, R-universe can be a great place to find packages that are not available on CRAN or Bioconductor, and it is a good place to discover new and interesting packages that are being developed by the R community.

It is, however, worth noting that almost any package available on R-Universe will also be available on CRAN, Bioconductor, or GitHub/GitLab so it is not necessary to use R-Universe to find packages that are not available on CRAN or Bioconductor.

Repositories and Package Managers

Repositories are an essential part of the R package ecosystem, as they are the locations from which R packages are downloaded and installed. Package managers like {renv}, {packrat}, and {checkpoint} use repositories to manage package versions and dependencies, and they can help you ensure that your projects are reproducible and that the correct versions of packages are used.

When you install a package using a package manager, the package manager will download the package from a repository and store it in a library on your computer. The package manager will also keep track of the version of the package that was installed and any dependencies that the package has. This makes it easy to reproduce your projects and ensure that the correct versions of packages are used.

In the next section, we will discuss the various package managers that are available in R and how they can help you manage your packages and dependencies.

See also the Posit article on packages, libraries, and repositories.

Back to top

Footnotes

  1. Strictly speaking, the libraries do not have to be on your computer, but they do need to be accessible to your R session. For example, you could store your libraries on a network drive or on a cloud storage service like Dropbox, but this is not at all recommended because it can lead to performance issues and other problems. A more likely scenario is that you use RStudio through Posit Cloud (i.e. RStudio on a web interface) in which case the available libraries are stored and managed either by Posit or by an IT group for your institution on a separate server.↩︎