Perturbative Techniques

Not only masking of values, but distortion of values

Explain the idea and advantages more

Add a pro and con list?

Select most relevant techniques (swapping, noise?); potentially shorten or delete others

Add examples to each

Techniques following Carvalho et al. (2023)

Swapping

  • exchanging values on certain variables between participants

  • types of swapping

    • record swapping (also known as data swapping): for categorial variables; swapping the values on the variables (e.g., gender, country of residence); t-order equivalence means that the frequency tables for t variables are not changed (e.g., 1-order equivalence: same number of males and females as before, 2-order equivalence: same number of males and females from Switzerland and Germany respectively)

    • rank swapping: for continuous variables; swapping values only within certain range of the rank to limit distortion of data

  • advantages

    • removes relationship between record and individual

    • can be used in one or more sensitive variables without disturbing the non-sensitive variables

    • provides protection to rare and unique values

    • not limited to the type of variable

  • disadvantages

    • may produce number of cases with unusual combinations

    • non-random swapping means work

    • can severely distort statistics for subgroups

    • not useful against attribute disclosures

Re-sampling

  • idea: create averages of independent samples

  • bootstrap independent samples

  • use average of first sample for first row, then average of second sample for second row…

check for correct understanding

Noise

  • also known as randomization

  • idea: add more or less random value (additive noise) or multiply by more or less random value (multiplicative noise)

  • noise can be correlated or uncorrelated with values

  • transformations after adding the noise are possible

  • differential privacy methods usually mean noise

TipDifferential Privacy
  • adds noise to data, leading to plausible deniability for any individual

  • results of analysis stay the same independent of noise

  • results of analysis stay the same, independent if one person is in there or not

  • diffpriv: An R Package for Easy Differential Privacy

  • “Even if the attacker already suspects X is the only possible HIV case in the dataset, the data release should not confirm or deny that suspicion.”

Microaggregation

  • idea: create groups of similar values and change these to an aggregate value (e.g., mean, median)

  • works better when groups are more homogeneous

Rounding

  • round values to certain other values

PRAM

  • Post RAndomisation Method
  • values on a categorical variable are recoded with a certain probability

Shuffling

  • variation of swapping
  • generate new sensitive data based on similar distributional properties
  • change the order of sensitive values based on the rank of new sensitive data

Explain how to ensure that the statistics are the same (or reference utility section)

Learning Objective

  • After completing this part of the tutorial, you will be able to apply selected perturbative techniques in R.

Exercise

  • Apply one or two techniques for certain variables in R

  • Check statistics before and after

To Do List

  • Research best practices on ensuring that statistics stay similar
Back to top

References

Carvalho, Tânia, Nuno Moniz, Pedro Faria, and Luís Antunes. 2023. “Survey on Privacy-Preserving Techniques for Microdata Publication.” ACM Computing Surveys 55 (14s): 1–42. https://doi.org/10.1145/3588765.