Welcome to the Data Anonymization Course!

This website is a work in progress. Please come back later in 2026 :)

Tutorial Overview

This self-paced tutorial on anonymization of quantitative research data is intended to take about three hours to complete.

The tutorial is split into the following sections:

  1. FOUNDATIONS OF DATA PROTECTION talks about data protection basics in ethics and law, mechanisms of data protection in research, and basic terms.
  2. DATA ANONYMIZATION PROCESS walks you through the process of anonymizing your research data based on example data.
  3. BALANCING DATA PROTECTION AND OPENNESS presents methods for aligning your data protection and open science interests.
  4. ANONYMIZATION WORKFLOW closes this tutorial by summarizing the learned workflow.

What You’ll Learn

By the end of this tutorial, you will be able to:

  • Understand key concepts in the world of privacy (e.g., anonymization, k-anonymity)

  • Classify data in relevant categories for data protection (e.g., personal data, sensitive data)

  • Apply anonymization techniques using R in a coherent workflow

  • Make informed decisions when balancing the risks and utility of the anonymized data

What You Will NOT Learn

You will not learn anything other than anonymization of quantitative data.

Here are a few helpful links for other data types:

Prerequisites

  • You need basic R skills (e.g., loading data and packages). Experience with data wrangling with tidyverse is beneficial.
  • Necessary software: R and RStudio
Back to top