Documentation
Why Document?
Thorough documentation of your anonymization process serves two purposes:
- It provides an internal record for auditing - if questions arise later about how the data was handled, you can retrace every step.
- It provides transparency for data users, who need to understand what was changed in order to interpret the data correctly.
Internal Documentation
Your internal documentation should describe the full anonymization workflow: which variables were modified, which techniques were applied (and with what parameters), what risk levels were measured before and after, and who made the decisions. Think of it as a lab notebook for your data processing. This document should be stored securely alongside the original data - it is not meant for publication, since it may contain information that could aid re-identification (e.g., the exact thresholds used for top-coding).
External Documentation in Data Dictionary
The external documentation is what you share alongside the anonymized dataset. It should include a data dictionary (codebook) describing every variable in the released dataset, including any changes from the original. For example, if age was recoded from exact years to age bands, the codebook should state this clearly. Be careful not to reveal details that could help reverse the anonymization.
If you publish your anonymization scripts (which is good practice for reproducibility), review them carefully. Scripts can accidentally reveal information about suppressed or recoded values - for example, a line like filter(country == "Liechtenstein") tells an attacker that someone from Liechtenstein was in the original data. Use generic variable references where possible.
Make more precise
Exercise
Make changes to the data dictionary in accordance with the anonymization steps you made in this tutorial.
Insert solution of exercise
Learning Objective
- After completing this part of the tutorial, you will know best practices on how to document your anonymization process.
Exercises
- Make changes to this codebook to document anonymization steps (show old codebook of exercise data)
- Include an exercise on writing an internal report of risk management?