Balancing Utility and Privacy

contradictory ideals, but enable each other

summarize ideas by Jansen, Borgert, and Elson (2025)

Measuring Privacy

Calculate privacy levels (k-anonymity) and compare to the value measured before
Red teaming data anonymization (Jansen et al. 2026)

Include exercise on calculating the privacy level with k-anonymity like before in 2_1_Planning Privacy

Measuring Utility

Provide overview of utility measurements and maybe try out one

utility indices (Carvalho et al. 2023)
- predictive performance measures for machine learning (potentially as call-out box)
- information loss measures:
  - distance/distribution comparisons
  - a penalty of transformations through generalisation and suppression
  - statistical differences
assessing utility for the current use case
- perform statistical analysis before and after anonymization
- see if results are comparable

Include exercise for calculating one information loss measure

Striking the Right Balance

iterative process: rework anonymization after measuring both
explain the norms for acceptable values of privacy and utility indicators
People to contact at LMU (data stewards? open science team?)

Learning Objective

After completing this part of the tutorial, you will be able to make informed decisions when balancing the risks and utility of the anonymized data.

Exercises

apply measurement of k-anonymity
apply measurement of utility

Resources, Links, Examples

examples in UK: https://ukdataservice.ac.uk/deposit-data/sharing-experiences/

References

Carvalho, Tânia, Nuno Moniz, Pedro Faria, and Luís Antunes. 2023. “Survey on Privacy-Preserving Techniques for Microdata Publication.” ACM Computing Surveys 55 (14s): 1–42. https://doi.org/10.1145/3588765.

Jansen, Luisa, Nele Borgert, and Malte Elson. 2025. “On the Tension Between Open Data and Data Protection in Research.” April 7, 2025. https://doi.org/10.31234/osf.io/5jt3s_v2.

Jansen, Luisa, Tim Ulmann, Robine Jordi, and Malte Elson. 2026. “Putting Privacy to the Test: Introducing Red Teaming for Research Data Anonymization.” arXiv. https://doi.org/10.48550/arXiv.2601.19575.