Reproducibility
In this tutorial, we focus on the practical aspects of making sure that the code we write can be run on other machines, by other people, and in the future. In other words, we want to make sure that our code is portable and future-proof by ensuring the software originally used in creating our code is the same software used by others. Let’s look at this in more detail.
What is Reproducibility?
A Brief Definition
I propose to consider the question “What is reproducibility?” As reproducibility is such a central concept in science, one would think that it would be clearly defined. However, this is not the case. Reproducibility is an elusive concept. It has no single commonly agreed upon definition. Rather, it has many different, and each of these captures central, but different properties of it…
– Odd Erik Gundersen1
Gundersen explores many definitions put forward for “reproducibility”, but, for the purposes of this tutorial, his exploration of reproducibility is best summarized as “the ability of independent investigators to draw the same conclusions from an experiment by following the documentation shared by the original investigators.”2
In principle, this sounds simple enough–provide others with enough information, and they should be able to reproduce your work! Achieving this in practice is, of course, a vastly complex endeavor that requires careful planning and execution. Above all, opinions on what constitutes sufficient “documentation” or how much documentation is necessary to achieve reproducibility can vary widely.
One might assume that academic publications provide all of the information required for an independent researcher to reproduce a publication; the requisite background on the topic, the data cleaning choices, and the exact statistical methods employed to arrive at a conclusion are theoretically included in a paper.3 In practice, a written publication generally cannot capture all of the nuances of data cleaning, exploration, and analysis employed by the original researchers. As a result, reproducing an analysis result only from a written description is, at best, incredibly time-consuming, and, at worst, an impossible endeavor.
To address the shortcomings of written descriptions, researchers must also strive to publish their code, data, and ideally both code and data together such that the complete documentation of the project is available to others. This tutorial specifically covers how to document the R version and R packages used in a project to ensure that the code can be run on other machines.
Reproducible Code
Why should we concern ourselves with making reproducible code specifically? From a scientific perspective, reproducible code is essential in allowing other researchers to verify findings, to build upon existing work, and to ensure that the scientific process is transparent and trustworthy. From a practical perspective, reproducible code can save time and effort in the long run by making it easier to revisit and understand one’s own work, to make debugging and troubleshooting easier, and to collaborate with others.
In the context of this tutorial, we will focus on the practical aspects of making code reproducible. This mainly means taking steps to ensure that the code we write can be run on other machines, by other people, and in the future. In other words, we want to make sure that our code is portable and future-proof by ensuring the software originally used in creating our code is the same software used by others. Let’s look at this in more detail.
The ~Machines~
Your Computer
Getting code to work on your own machine is, in principle, not too difficult. You can install the necessary software specific to your hardware and software, set up your environment, and run your code! Simple, right? (Writing code that actually works is another story, of course 😉.)
However, this is only half the battle. When you run your code on your machine, you are running it in an environment that you have set up and configured to your liking. You have installed the software you need, perhaps have set up specific paths, and configured various settings to your preferences.
Someone Else’s Computer
When you share your code with others, you are effectively asking them to run your code on their machine, and it is unlikely that their computer is set up exactly like yours given how many degrees of freedom there are in operating systems, programming languages, software, and the different versions of these. A non-exhaustive list of examples where there might be software discrepancies are detailed below.
Major Machine & Software Differences
It’s already well-known that different operating systems can have different software requirements. For example, some software might only be available on Windows, while others might only be available on MacOS or Linux.
However, it is also important to consider that different versions of the same operating system can have different software requirements. For example, some software might only be compatible with Windows 10 and not Windows 11.
Another obvious difference is the programming language used in a project. For example, a project written in R will not run (correctly) in a Python, MATLAB, or Julia environment.
Less obvious, however, and more common as a pain-point, are the differences in versions of programming languages. For example, a project written in Python 3.8 might not run in Python 3.9 due to changes in the language, and most R packages are only tested for compatibility back to the 5 most recent minor releases of R (e.g. 4.4, 4.3, 4.2, 4.1, 4.0).
Fortunately, most modern programming languages are cross-compatible across recent, major operating systems without issue.
The most likely pain point for reproducibility is the software add-ons, or packages/libraries, that are used in a project. For example, in R, there are over 20,000 packages available on CRAN, and in Python, there are over 200,000 packages available on PyPI.
Keeping track of which packages are used and the specific versions of those packages is a major challenge in reproducibility.
In the context of the R programming language, most packages are likewise compatible across operating systems.
Some real-world examples:
- Operating systems: Windows 95 gives different results,4 and neuroimaging analysis results can vary based on OS5
- Programming languages: R switched its default random number generating process at version 3.6.0,6 and R versions from 4.4.0 give different (albeit more consistent) values for the
density()
function7 - Packages/libraries: JASP changed the default Bayesian ANOVA8
All of the Machines
- Don’t expect others to have the software you rely on.
- Even if others have the software, don’t expect them to have the same version.
In summary, the software environment of a project can be incredibly complex, with many degrees of freedom. If you have ever tried to run someone else’s code and it didn’t work, it was likely due to one of these reasons. Moreover, it’s not practical to manage most of these differences manually. For example, requesting that someone manually install Python v3.8.2, R v4.0.3, and specific versions of dozens of packages is technically possible, but exceptionally tedious and a poor use of time. So this brings us to our core question: how do we set up a project to work on everybody’s machines?9 By managing our software dependencies, described in the next chapter.
References
Footnotes
“The fundamental principles of reproducibility,” Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 379, no. 2197 (May 17, 2021): 20200210, https://doi.org/10.1098/rsta.2020.0210.↩︎
Assuming, of course, that the data for the analysis can also be accessed either because it is publicly available or by way of contacting the original researcher.↩︎
B. D. McCullough and H. D. Vinod, “Verifying the Solution from a Nonlinear Solver: A Case Study,” The American Economic Review 93, no. 3 (2003): 873–92, https://www.jstor.org/stable/3132121.↩︎
Tristan Glatard et al., “Reproducibility of Neuroimaging Analyses Across Operating Systems,” Frontiers in Neuroinformatics 9 (April 24, 2015), https://doi.org/10.3389/fninf.2015.00012.↩︎
Abel Brodeur et al., “Mass Reproducibility and Replicability: A New Hope,” 2024, https://www.econstor.eu/handle/10419/289437.↩︎
Robert Schlicht, Bug 18337 - Issue in Stats::density.default Affecting Accuracy (bugs.r-project.org, 2022), https://bugs.r-project.org/show_bug.cgi?id=18337.↩︎
Don van den Bergh, Eric-Jan Wagenmakers, and Frederik Aust, “Bayesian Repeated-Measures Analysis of Variance: An Updated Methodology Implemented in JASP,” Advances in Methods and Practices in Psychological Science 6, no. 2 (April 1, 2023): 25152459231168024, https://doi.org/10.1177/25152459231168024.↩︎
Within reason; many software and hardware configurations just simply were not meant to be, but most modern programming languages are cross-compatible across recent, major operating systems without issue.↩︎