Caching

This page will discuss the concept of caching and how it is used in {renv}.

What is Caching?

In the context of software development, caching is used to store data that is frequently accessed, such as packages, to speed up the execution of a program. When a program needs to access a package, it first checks the cache to see if the package is already stored there. If the package is found in the cache, the program can retrieve it quickly without having to download it again. This can significantly reduce the time it takes to run a program, especially if the package is large or if the program is run frequently.

Caching in {renv}

In the context of {renv}, the package cache is a shared library that contains the packages downloaded for and then used in your projects. The cache will, when needed, contain multiple different versions of the same package and your project will link to the correct version, only downloading the version specified in the renv.lock if you don’t already have it somewhere in the renv cache. This shared library is a huge space saver, especially if you have many projects using the same packages.

A cache is built per each minor version of R you use. For example, if you have used {renv} with R versions 4.3 and 4.4 on your computer, then you will end up with a cache matching each of these R versions. This is an important detail to note for the Restoring Projects section of this tutorial, as you may need to rebuild the cache if you upgrade your R version.

Cache Locations

Although you will likely never need to interact with the cache directly, it can be useful to know where it is located on your system in case you need to troubleshoot any issues with {renv}. Moreover, seeing the actual cache locations can help make the concept of caching more tangible.

The {renv} caches for R packages will be nested in folders below one of the following locations, based on your operating system (Posit 2024):

  • Linux: ~/.cache/R/renv/cache

  • macOS: ~/Library/Caches/org.R-project.R/R/renv/cache

  • Windows: %LOCALAPPDATA%/renv/cache

Within each of these cache folders, you should see sub-folders for each version of R that you have used with {renv}. For example, on a macOS system, you might see the following folders:

list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5", full.names = T)

#> [1] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/macos"
#> [2] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.2"
#> [3] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3"
#> [4] "~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4"

The {renv} package also provides a function to access the exact path to the cache used in your current project. This cache location will be slightly more specific than the paths listed above because it is a reference to the one cache specific to the version of R used in your project. You can access the path to the cache with the following code:

# Run on a MacOS, and <USER> removed. 
renv::paths$cache()
#> [1] "/Users/<USER>/Library/Caches/org.R-project.R/R/renv/cache/v5/macos/R-4.4/aarch64-apple-darwin20"

Packages in the Cache

Each of the caches will contain the packages that you downloaded for use in your projects as you were using that version of R. So for example, if you downloaded version 1.0.9 of {dplyr} while using R 4.3, you would find the package in the R-4.3 folder of the cache. If you then started a project using R 4.4 and downloaded version 1.1.2 of {dplyr}, you would find that version in the R-4.4 folder of the cache. Moreover, if another project using R 4.4 also needed version 1.1.4 of {dplyr}, it would be found in the R-4.4 folder of the cache.

This text might be an over-explanation of the concept so I’ll demonstrate it with a code chunk below that lists out the versions of {dplyr} in the cache.

list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3/aarch64-apple-darwin20/dplyr")
#> [1] "1.1.2" "1.1.4"
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4/aarch64-apple-darwin20/dplyr")
#> [1] "1.1.2" "1.1.4"

So it seems that I have versions 1.1.2 and 1.1.4 of {dplyr} in the caches for both R 4.3 and R 4.4. However, this is merely a coincidence; it won’t always be the case that the caches have the exact same versions of packages. Let’s look at an example where the versions differ.

list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.3/aarch64-apple-darwin20/ggplot2")
#> [1] "3.4.3" "3.4.4" "3.5.0" "3.5.1"
list.files("~/Library/Caches/org.R-project.R/R/renv/cache/v5/R-4.4/aarch64-apple-darwin20/ggplot2")
#> [1] "3.5.0" "3.5.1"

In this case, I have versions 3.4.3, 3.4.4, 3.5.0, and 3.5.1 of {ggplot2} in the cache for R 4.3, but only versions 3.5.0 and 3.5.1 in the cache for R 4.4 which is an expected outcome. I used several older version of {ggplot2} with an older version of R and then upgraded to a newer version of R and only used the newer versions of {ggplot2}!

Project Library

The discussion of caching so far has covered just the shared libraries that {renv} uses to store packages. But how does {renv} use these caches, how does this relate back to your project libraries, and what is the role of the renv/library folder in your projects?

Each {renv} project has its own library, located in the renv/library folder of the project. When you install a package in a project with {renv}, however, the package is not technically installed in the renv/library library. In fact, none of the packages used in your project are actually stored in this folder. Instead, the contents of these folders are “symlinks” to the packages in the shared library.

Summary and Key Points

This chapter contained quite a bit of technical detail and may have been a bit overwhelming. Here are the key points to remember:

  1. Caching is used to store data that is frequently accessed, such as packages, to speed up the execution of a program.
  2. In the context of {renv}, the package cache is a shared library that contains the packages downloaded for and then used in your projects.
  3. The cache is built per each minor version of R you use, and the cache locations are specific to your operating system. You will almost never need to directly access these cache locations.
  4. Each project has its own library, located in the renv/library folder of the project, but the packages in this folder are actually symlinks to the packages in the cache.
Back to top

References

Posit. 2024. “Installing Packages.” https://rstudio.github.io/renv/articles/package-install.html.