Personal Data
What are Personal Data?
According to Article 4(1) GDPR, Personal Data is defined as any information relating to an identified or identifiable natural person (also named data subject.) This definition is a cornerstone of privacy regulations, particularly the EU General Data Protection Regulation (GDPR), and extends far beyond obvious identifiers.
An individual is considered identifiable if they can be recognized, either directly or indirectly, through various identifiers or factors. These identifiers can include:
Direct identifiers. These are information that can directly point out and identify an individual, such as address, social security number, bank accounts, or an email address.
Indirect or quasi identifiers. These are data that can, when combined with other pieces of data, lead to the identification of an individual. Examples include date of birth, age, gender, geographic location (like a ZIP or postal code), marital status, or details about events (e.g., admission dates, procedure codes).
The assessment of whether a person is identifiable takes into account all means reasonably likely to be used by the data controller or another person to identify the natural person, including methods such as “singling out”. This “reasonably likely” criterion considers objective factors such as costs, time, effort, and the technological means available at the time of processing, as well as potential future technological developments. For instance, a dynamic IP address can qualify as personal data if it can be linked to a specific person, even if that linking capability resides with an Internet Service Provider and requires a court order (see also legal case Breyer v. Germany).
Shorten a bit/make more coherent
Special Types of Personal Data
Move PII and PHI to call-out box or delete completly since these terms are not really used in German/European legislation and practice
Personally Identifiable Information (PII): This includes data unique or nearly unique to an individual, specified in policy and regulations. Examples are full name, email address, physical address, phone numbers, date of birth, age, gender, marital status, national identification numbers (e.g Sozialversicherungsnummer), credit card numbers, vehicle registration, driving license details, employment details (salary), and educational qualifications (see also “direct identifiers”)
Protected Health Information (PHI): In the healthcare context, PHI refers to past, present, or future physical or mental health information that directly or indirectly identifies an individual. This is considered highly sensitive and often includes diagnoses, treatment records, or even genetic information.
Special categories of personal data (Art 9 GDPR): The GDPR describes furthermore several categories of sensitive data which receive heightened protection:
- racial or ethnic origin,
- political opinions,
- religious or philosophical beliefs,
- trade union membership,
- genetic data, biometric data (for unique identification),
- health data, and data concerning a natural person’s sex life or sexual orientation.
Otherwise sensitive data
- information about criminal convictions and offenses
- financial data
Change the exercise to data that R excercises will be based on
Exercise
| CustomerID | Name | Age | ZIPCode | Diagnosis | PoliticalOpinion | PurchaseAmount | |
|---|---|---|---|---|---|---|---|
| 001 | Maria Schmidt | 34 | 53111 | maria.schmidt@example.com | Migraine | None | 120.50 |
| 002 | Max Müller | 29 | 80331 | max.mueller@web.de | None | Green Party supporter | 75.00 |
| 003 | Anna Fischer | 45 | 10115 | anna.fischer@gmail.com | Diabetes | None | 220.00 |
| 004 | — | 54 | 20095 | — | None | None | 15.00 |
| 005 | Lukas Weber | 38 | 40210 | lukas.weber@mail.de | Hypertension | None | 330.00 |
| 006 | Sophie Klein | 27 | 70173 | sophie.klein@outlook.com | None | Conservative voter | 60.00 |
| 007 | — | 62 | 04109 | — | Heart disease | None | 500.00 |
| 008 | Peter Braun | 31 | 50667 | peter.braun@yahoo.com | None | None | 42.00 |
| 009 | Julia Meyer | 22 | 68159 | julia.meyer@uni-heidelberg.de | Depression | None | 250.00 |
| 010 | David Wolf | 40 | 23552 | david.wolf@gmail.com | None | Social Democrat | 99.90 |
| 011 | Elena Schwarz | 36 | 90402 | elena.schwarz@posteo.de | Asthma | None | 185.00 |
| 012 | Thomas Becker | 52 | 28195 | thomas.becker@gmx.de | None | None | 305.00 |
| 013 | Clara Hofmann | 28 | 01067 | clara.hofmann@stud.uni-dresden.de | None | Liberal voter | 75.00 |
| 014 | Jonas Lehmann | 47 | 99084 | jonas.lehmann@arcor.de | Cancer | None | 800.00 |
| 015 | Paula Richter | 33 | 17489 | paula.richter@uni-greifswald.de | Anxiety | None | 220.00 |
Questions
- Which columns contain direct identifiers?
- Which contains indirect identifiers?
- Which contain special categories of personal data?
- Is there any column that does not contain personal data?
Learning Objective
- After completing this part of the tutorial, you will be able to distinguish between personal data and non-personal data, as well as sensitive and non-sensitive data, and be able to identify direct and indirect identifiers.
Exercises
- Identify variables that contain direct identifiers, indirect identifiers, and sensitive data
Resources, Links, Examples
- examples for how to categorize data: Van Ravenzwaaij et al. (2025)