De-Associative Techniques
idea: break the association between indirect identifiers and other data
simplest idea: separating datasets
- disadvantage: no connection possible for data analysis purposes
- rather: a last resort
Bucketization works by first grouping records into “buckets” based on their indirect identifiers (like age, gender, postal code). Each bucket must contain at least k records to satisfy k-anonymity. Within each bucket, the sensitive values (like income or political opinions) are then randomly shuffled, so the link between a specific person’s indirect identifiers and their sensitive attributes is broken. An attacker might narrow someone down to a bucket, but they cannot tell which sensitive value belongs to whom within that bucket.
Example Technique: Bucketization
idea: create QI groups with at least k records
stems from the created buckets or partitions
1. step: generalization to create buckets (e..g., countries to continents)
2. step: de-generalize the QI in the created buckets (i.e., continents back to countries)
3. step: permutate sensitive values within each bucket
potentially swap for another technique that is not so confusing with regard to usual perturbative techniques (anatomization)
Learning Objective
- After completing this part of the tutorial, you will be able to apply selected de-associative techniques in R.
Exercises
- apply one selected technique to the dataset?