Make a README

What Does a README Contain?

Having settled on a license, it is time to add a final touch. Imagine returning to your project in five years, having forgotten most of the details of what you did exactly. What would be useful to know in order to quickly understand what is going on in the project? This is what needs to be described in the README. While you could just start writing along, it is helpful to provide at least the following information in sections on their own.

Name and Description

How is the project called? What is it about? Which files does the project folder contain? How are they organized?

Involved Data

Are any (empirical) data involved (e.g., being analyzed or used as input)? From which sources can they be obtained? Are they already included in the project folder? Where is their data dictionary located? Which terms, usage restrictions, or licenses apply? If they are not publicly available, is an alternative, synthesized version provided?

Computational Requirements

What software needs to be installed to run the analysis – in other words, what are its dependencies? This also includes software that you have used for any manual steps. For every dependency, describe where it can be obtained from. If the code has particular hardware requirements (e.g., in terms of processor or memory), these should be also noted. Finally, for steps that take more than a couple of seconds, the approximate runtime should be indicated.

Note 1 provides more information on determining the dependencies of an R project.

Usage

How can one run the project – is there a master script or a particular order in which any scripts need to be executed? Provide detailed instructions for running the full project.

List of Results

For every result (i.e., number, figure, or table) that is computed in the project and displayed in the manuscript, indicate where exactly it is computed.

Citation

Is there a recommended way to cite this project? Is there a published article associated with it that you would like to have cited?

License

Under which licenses are the works in this project folder available?

Create It!

Create your README now as the file README.md.

Note 1: Identifying R Dependencies

R itself and the R packages are already documented as this project uses renv. Therefore you can focus on all other dependencies, such as the system dependencies of R packages as well as the version of Quarto.1

An overview over the system dependencies of R packages can be created using the function pak::pkg_sysreqs(). In combination with renv, we can obtain the system dependencies of all R packages the current project directly depends on:

Console
# Find all R package dependencies
deps <- renv::dependencies()$Package |>
  unique() |>
  pak::pkg_deps(dependencies = NA) |>
  getElement("package")

# Identify their system dependencies
pak::pkg_sysreqs(deps)

The output may look like the following:

── Install scripts ────────────────── Fedora 40 ── 
dnf install -y make pandoc git

── Packages and their system dependencies ────────
fs        – make
knitr     – pandoc
remotes   – git
rmarkdown – pandoc
sass      – make

We can see that the programs make, pandoc, and git were identified as system dependencies. Often, one can obtain their version by running them with the --version argument:

Terminal
make --version
pandoc --version
git --version

However, this does not work for all system dependencies. Specifically, it does not work for libraries – software that is not supposed to be run on its own. Identifying their version is beyond the scope of this tutorial.

We also know that we need Quarto to create the PDF, so let’s find out its version as well:

Terminal
quarto --version

If you installed apaquarto or any other Quarto extension, one can query their versions as follows:2

Terminal
quarto list extensions

Finally, we know that we installed a \(\TeX\) distribution to create the PDF, so let’s find out its version by running:

Terminal
quarto check

The output is quite long and it might look slightly different for you, but the relevant sections are the following:

[✓] Checking tools....................OK
      TinyTeX: v2024.09
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/r155953/.TinyTeX/bin/x86_64-linux
      Version: 2024

Of course, all the system dependencies identified until now may have dependencies on their own. Use your own judgement to decide when not to dig deeper.

If you feel stuck, you can have a look at the following examples:

README.md
# Penguin Paper

This project contains the Quarto manuscript of our study on penguins ("Manuscript.qmd"). It is written in R and uses `renv` to track its dependencies.

The most important file in this project folder is `Manuscript.qmd` which contains the text of the article as well as the code for its computations. It is accompanied by the following files:

- `Bibliography.bib`: bibliographic references used in the manuscript
- `data.csv`: a data set containing the simplified `palmerpenguins` data
- `data_dictionary.html`: a dictionary to the data file,
  created using `data_dictionary.qmd`

The folder `_extensions` contains the `apaquarto` extension which is used to typeset the PDF accoording to APA guidelines.
README.md
## Involved Data

The manuscript analyzes the "palmerpenguins" data set available from <https://cran.r-project.org/package=palmerpenguins>. The data is stored as "data.csv" and documented in the file "data_dictionary.html". It is made available under CC0 1.0.
README.md
## Computational Requirements

This manuscript requires the following system software to be installed. In addition, we provide the version numbers this manuscript has last been run with:

- [Quarto](https://quarto.org/docs/download/) 1.6.9
- GNU Make 4.4.1
- Pandoc 3.3
- TinyTeX 2024.09
- [R](https://cloud.r-project.org/) 4.4.1

On Fedora Linux, Make and Pandoc can be installed as follows:

```bash
dnf install -y make pandoc
```

Quarto and R can be installed using the links provided. TinyTeX can be installed using Quarto by entering the following into the terminal:

```bash
quarto install tinytex
```

All R packages that this project requires are managed using [`renv`](https://cran.r-project.org/package=renv). Therefore, `renv` needs to be installed first, by entering the following in the R console:

```r
install.packages("renv")
```

Next, one can open a new R session in the root folder of this project and run the following, which should install all required R packages at their recorded versions:

```r
renv::restore()
```
README.md
## Usage

The manuscript can be rendered to PDF using the following command:

```bash
quarto render Manuscript.qmd
```
README.md
## List of Results

- In-text numbers in the section "results": Calculated in the chunk "t-test" within "Manuscript.qmd"
- Table 1: Calculated in the chunk "tbl-descriptive-statistics" within "Manuscript.qmd"
- Figure 2: Calculated in the chunk "fig-bill-length-comparison" within "Manuscript.qmd"

You need to add your name in the following example:

README.md
## Citation

Please cite this draft as follows:

> Zerna, Scheffel, & <YOUR NAME> (2024): "A Study on Penguins: A Minimal Reproducible Example". Unpublished manuscript.

Of course, you would use the same license for the manuscript that you chose in the previous step. You need to add your name in the following example:

README.md
## License

The manuscript files `Manuscript.qmd`, `Manuscript.tex`, and `Manuscript.pdf` by Josephine Zerna, Christoph Scheffel, and <YOUR NAME> are available under [CC\ BY-SA\ 4.0](https://creativecommons.org/licenses/by-sa/4.0/) or (at your option) under the [AGPLv3](https://www.gnu.org/licenses/agpl-3.0.html) (or later). For further copyright information, see `LICENSE.txt`.

Footnotes

  1. As of August 2024, a proposal for renv to record the version of Quarto has not been implemented, see rstudio/renv#1143.↩︎

  2. Luckily, the extensions are included in the project folder, so technically their version is already recorded in the project’s files.↩︎