Personal Data
Watch this video that explains what personal data are:
According to Article 4(1) GDPR, personal data is defined as any information relating to an identified or identifiable natural person (also named data subject.) This definition is a cornerstone of privacy regulations, particularly the EU General Data Protection Regulation (GDPR), and extends far beyond obvious identifiers.
An individual is considered identifiable if they can be recognized, either directly or indirectly, through various identifiers or factors. These identifiers can include:
Direct identifiers. These are information that can directly point out and identify an individual, such as address, social security number, bank accounts, or an email address.
Indirect or quasi identifiers. These are data that can, when combined with other pieces of data, lead to the identification of an individual. Examples include date of birth, age, gender, geographic location (like a ZIP or postal code), marital status, or details about events (e.g., admission dates, procedure codes).
The assessment of whether a person is identifiable takes into account all means reasonably likely to be used by the data controller or another person to identify the natural person, including methods such as “singling out”. This “reasonably likely” criterion considers objective factors such as costs, time, effort, and the technological means available at the time of processing, as well as potential future technological developments. For instance, a dynamic IP address can qualify as personal data if it can be linked to a specific person, even if that linking capability resides with an Internet Service Provider and requires a court order (see also legal case Breyer v. Germany).
Explain for research context
Special Types of Personal Data
The GDPR (Art. 9) describes several categories of sensitive data that receive heightened protection:
- racial or ethnic origin,
- political opinions,
- religious or philosophical beliefs,
- trade union membership,
- genetic data, biometric data (for unique identification),
- health data,
- and data concerning a natural person’s sex life or sexual orientation.
Otherwise sensitive data include:
- information about criminal convictions and offenses
- financial data
Change the exercise to data that R excercises will be based on
Exercise
| CustomerID | Name | Age | ZIPCode | Diagnosis | PoliticalOpinion | PurchaseAmount | |
|---|---|---|---|---|---|---|---|
| 001 | Maria Schmidt | 34 | 53111 | maria.schmidt@example.com | Migraine | None | 120.50 |
| 002 | Max Müller | 29 | 80331 | max.mueller@web.de | None | Green Party supporter | 75.00 |
| 003 | Anna Fischer | 45 | 10115 | anna.fischer@gmail.com | Diabetes | None | 220.00 |
| 004 | — | 54 | 20095 | — | None | None | 15.00 |
| 005 | Lukas Weber | 38 | 40210 | lukas.weber@mail.de | Hypertension | None | 330.00 |
| 006 | Sophie Klein | 27 | 70173 | sophie.klein@outlook.com | None | Conservative voter | 60.00 |
| 007 | — | 62 | 04109 | — | Heart disease | None | 500.00 |
| 008 | Peter Braun | 31 | 50667 | peter.braun@yahoo.com | None | None | 42.00 |
| 009 | Julia Meyer | 22 | 68159 | julia.meyer@uni-heidelberg.de | Depression | None | 250.00 |
| 010 | David Wolf | 40 | 23552 | david.wolf@gmail.com | None | Social Democrat | 99.90 |
| 011 | Elena Schwarz | 36 | 90402 | elena.schwarz@posteo.de | Asthma | None | 185.00 |
| 012 | Thomas Becker | 52 | 28195 | thomas.becker@gmx.de | None | None | 305.00 |
| 013 | Clara Hofmann | 28 | 01067 | clara.hofmann@stud.uni-dresden.de | None | Liberal voter | 75.00 |
| 014 | Jonas Lehmann | 47 | 99084 | jonas.lehmann@arcor.de | Cancer | None | 800.00 |
| 015 | Paula Richter | 33 | 17489 | paula.richter@uni-greifswald.de | Anxiety | None | 220.00 |
Questions
- Which columns contain direct identifiers?
- Which contains indirect identifiers?
- Which contain special categories of personal data?
- Is there any column that does not contain personal data?
Learning Objective
- After completing this part of the tutorial, you will be able to distinguish between personal data and non-personal data, as well as sensitive and non-sensitive data, and be able to identify direct and indirect identifiers.
Exercises
- Identify variables that contain direct identifiers, indirect identifiers, and sensitive data
Resources, Links, Examples
- examples for how to categorize data: Van Ravenzwaaij et al. (2025)