Perturbative Techniques

Explain the idea more

Select most relevant techniques (swapping, noise?); potentially shorten or delete others

Add examples to each

Techniques following Carvalho et al. (2023)

Examples of Perturbative Techniques

Swapping

  • exchanging values on certain variables between participants

  • types of swapping

    • record swapping (also known as data swapping): for categorial variables; swapping the values on the variables (e.g., gender, country of residence); t-order equivalence means that the frequency tables for t variables are not changed (e.g., 1-order equivalence: same number of males and females as before, 2-order equivalence: same number of males and females from Switzerland and Germany respectively)

    • rank swapping: for continuous variables; swapping values only within certain range of the rank to limit distortion of data

  • advantages

    • removes relationship between record and individual

    • can be used in one or more sensitive variables without disturbing the non-sensitive variables

    • provides protection to rare and unique values

    • not limited to the type of variable

  • disadvantages

    • may produce number of cases with unusual combinations

    • non-random swapping means work

    • can severely distort statistics for subgroups

    • not useful against attribute disclosures

Re-sampling

  • idea: create averages of independent samples

  • bootstrap independent samples

  • use average of first sample for first row, then average of second sample for second row…

check for correct understanding

Noise

  • also known as randomization

  • idea: add more or less random value (additive noise) or multiply by more or less random value (multiplicative noise)

  • noise can be correlated or uncorrelated with values

  • transformations after adding the noise are possible

  • differential privacy methods usually mean noise

TipDifferential Privacy
  • adds noise to data, leading to plausible deniability for any individual

  • results of analysis stay the same independent of noise

  • results of analysis stay the same, independent if one person is in there or not

  • diffpriv: An R Package for Easy Differential Privacy

  • “Even if the attacker already suspects X is the only possible HIV case in the dataset, the data release should not confirm or deny that suspicion.”

Microaggregation

  • idea: create groups of similar values and change these to an aggregate value (e.g., mean, median)

  • works better when groups are more homogeneous

Rounding

  • round values to certain other values

PRAM

  • Post RAndomisation Method
  • values on a categorical variable are recoded with a certain probability

Shuffling

  • variation of swapping
  • generate new sensitive data based on similar distributional properties
  • change the order of sensitive values based on the rank of new sensitive data

Keeping Utility

Explain how to ensure that the statistics are the same (or reference utility section)

Pro and Contra of Using Perturbative Techniques

Add pro and con list

  • danger of reverse-engineering the perturbation technique applied

Exercise

  • Apply one or two techniques for certain variables in R

  • Check statistics before and after

Learning Objective

  • After completing this part of the tutorial, you will be able to apply selected perturbative techniques in R.

Exercise

  • Apply one or two techniques for certain variables in R

  • Check statistics before and after

Back to top

References

Carvalho, Tânia, Nuno Moniz, Pedro Faria, and Luís Antunes. 2023. “Survey on Privacy-Preserving Techniques for Microdata Publication.” ACM Computing Surveys 55 (14s): 1–42. https://doi.org/10.1145/3588765.