git status
Project Setup
This section guides you through setting up your R environment for data documentation and validation. You’ll create a reproducible project structure using Quarto (a tool for creating documents that combine code and text) and renv (a tool for managing R packages) that supports open science practices.
You might wonder: “I just want to learn about data dictionaries and validation. Why do I need Quarto, Git, and renv?”
These tools help you create reproducible, shareable documentation that follows open science best practices.
What each tool does:
- Quarto: Lets you write documents that mix explanatory text with R code, then automatically generate professional HTML/PDF reports. Your data dictionary and validation results will live in one cohesive document.
- Git: Tracks changes to your files over time. If something breaks, you can go back. When collaborating, everyone can see what changed.
- renv: Records which package versions you used. This ensures your code still works months later (and works for collaborators).
The payoff: After this one-time setup, you’ll be able to generate beautiful, reproducible data documentation with a single click. Your future self and collaborators will thank you.
Create Quarto Project
First, we will need to create a new Quarto project.
If you haven’t already, open RStudio – see Note 1 for how to use the terminal instead. Then, click on File > New Project… to open the New Project Wizard.
Here, select New Directory
And choose the project type Quarto Project.
Finally, enter the name of the directory where our report will be created in, for example data-documentation-validation-exercise
.
As we will use Git to track the version history of files, be sure to check Create a git repository. If you don’t know what Git is, have a look at the tutorial “Introduction to version control with git and GitHub within RStudio”.
Also, we will utilize the package renv
to track the R packages our project depends on. Using it makes it easier for others to view and obtain them at the exact same version at a later point in time. Therefore make sure that the box Use renv with this project is checked. Again, if this is the first time you are hearing about renv
, have a look at the tutorial “Introduction to {renv}”.
If you are already familiar with Markdown and Quarto, you can uncheck the box Use visual markdown editor.
Click on Create Project. Your RStudio window should now look similar to this:
If, like in the image, a Quarto file with some demo content was opened automatically, you can close and delete it, for example, using RStudio’s file manager.
Throughout this tutorial, you will need to run both R code and system commands (primarily git
and quarto
). Within RStudio, R code can be run by going to the tab Console, while system commands are executed in the tab Terminal. We also indicate where to run your code directly above each code snippet. If no indication is given, the code is only for demonstration purposes and does not need to be run.
The renv
package tracks which R packages your project uses. As you install packages throughout this tutorial (in the next section), renv
will automatically record them. You don’t need to do anything special right now.
Later in your work, before committing code to Git, you can run renv::status()
to check if everything is synchronized. For this tutorial, we’ll remind you when it’s time to use renv commands.
Without RStudio, one can create a Quarto project with version control and renv
enabled by typing the following into a terminal:
Terminal
quarto create project default data-documentation-validation-exercise
cd data-documentation-validation-exercise/
rm data-documentation-validation-exercise.qmd
git init
git checkout -b main
Then, one can open an R session by simply typing R
into the terminal. Next, make sure that getwd()
indicates that the working directory is data-documentation-validation-exercise
. Now, initialize renv
:
Console
::init() renv
Next Steps
Your project is now set up with Quarto, Git, and renv! In the next section, you’ll install the R packages needed for data documentation and get familiar with the Palmer Penguins dataset.