Reading Note: "Survey: Leakage and Privacy at Inference Time"

Jegorova, Marija, et al. "Survey: Leakage and privacy at inference time." IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

PreviousReading Note: "Membership Inference Attacks on Machine Learning: A Survey"NextReading Note: "Threats to Pre-trained Language Models: Survey and Taxonomy"

Last updated 5 months ago

Was this helpful?

Reading Note: "Survey: Leakage and Privacy at Inference Time"

Jegorova, Marija, et al. "Survey: Leakage and privacy at inference time." IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

Abstract

Leakage of data from publicly available ML models is an area of growing significance since it can draw on multiple sources of data, potentially including users’ sensitive data.

Inference-time leakage: the most likely scenario for publicly available models.

Topics:

What leakage is in the context of different data, tasks, and model architectures;
Taxonomy across involuntary and malicious leakage (i.e., involuntary data leakage which is natural to ML models, and potential malicious leakage caused by privacy attacks);
Current defence mechanisms, assessment metrics, and applications.

Key Words: Data Leakage, Privacy Attacks and Defences, Inference-Time Attacks

1 Definitions

What is personal (private) and sensitive data?

Personal data: in loose terms, relates to an identified or identifiable natural person.
Sensitive data: the personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric and health-related data; data concerning person’s sexual orientation.

Leakage for different data types: Types of data leakage are largely data-specific (e.g., text data, images, and tabular data).

Leakage for different tasks: Privacy violations and their mitigations are also task- and model-specific (e.g., classification, regression/prediction, generation/synthesis, and segmentation).

(also see Table 1, p3 of the original paper)

How do user actions affect leakage?

Passive / honest-but-curious user: interacts with the trained model as intended by design and in compliance with protocols (only involuntary / benign leakage if there is).
Malevolent user / an adversary: attempts to take advantage of potential vulnerabilities in the trained model to extract sensitive data (privacy attacks).

2 Involuntary Data Leakage

Ways in which data leak without malicious user intervention include overfitting and memorization.

Memorization:

Memorization of specific training data samples occurs when the model assigns some sample a significantly higher likelihood than expected by random chance (potential risks: membership inference attacks, sensitive attribute, and training dataset reconstruction).

Data augmentation reduces (but does not eliminate) the memorization capacity of a network, whereas increasing the size of the architecture increases its memorization capacity.

Feature Leakage: a special case of memorization, sensitive attributes/features are unintentionally memorized and revealed by the trained model at inference time (enables property inference attacks).

3 Malicious Leakage / Privacy Attacks

3.1 Membership Inference Attacks

MIAs (also sometimes called "linkage attacks") and reconstruction MIAs can be used to identify the individual records used for training open-access ML models.

Attribute inference attack (or reconstruction attack): access to the trained ML model and incomplete information about a data point, to infer the missing information about that point.

(For more details, also see another reading note here.)

3.2 Model Extraction Attacks

The end-goal of MEA is to steal the trained model functionality:

Steal the model parameters, given the model architecture (or at least the type);
Steal the entire model architecture when it is unknown;
Steal the model functionality (not necessarily reverse engineer the target model itself).

(Model extraction for generative models remains unexplored. There is no single reliable way to verify how much of the training data is memorized by a GAN. *2022)

3.3 Property Inference Attacks

A white-box attack to extract a specific sensitive attribute or feature of interest from a given target model. The goal is to build a meta-classifier, capable of telling whether a model contains an attribute (principle: similar models trained on similar datasets can exhibit similar properties).

To train the meta-classifier, an attacker trains a series of shadow classifiers on some datasets, where only some of the subsets exhibit the target property. The shadow models are not explicitly trained to learn the property, but learn it as a consequence of the bias introduced in the dataset. The weights and biases of models are often used as features in the training of the meta-classifier.

Poisoning Attacks: a special case of PIAs, polluting the data or model during the training, to cause a bias in the output, resulting in a leakage. (PIAs are so far somewhat limited to fully connected neural networks and classification tasks. *2022)

3.4 Reconstruction / Model Inversion Attacks

Partial reconstruction of private datasets from aggregated publicly available information including open-access or query-only trained ML models.

4 Current Defences

Current defence methods can largely be dichotomized as applying augmentations at the training data level versus training, tuning, and designing models with inbuilt defence mechanisms – at the model level.

4.1 Data Level

Simply deleting sensitive features / entries can violate the data integrity and consistency and represent a privacy risk of its own, since the pattern of “missingness” might reveal some data properties. Hence data obfuscation and sanitization are often applied to mask, scramble, or overwrite sensitive information with a realistic fake.

Data Obfuscation: perturb the sensitive information in the data through scrambling or masking.
Data Sanitization: disguise the sensitive information within the data by overwriting it with realistic-looking synthetic data (e.g., flipping labels, adding noise, and randomization). (Data sanitization is often a natural precaution for adversarial attacks [1].)

Learning with Synthetic Data: can be viewed as a natural extension to both data obfuscation and sanitization. (So far synthetic data generation with privacy guarantees remains elusive (largely because generative models capture the underlying training data distribution, and might leak properties of the dataset into its generated data, enabling PIAs). *2022)

4.2 Model Level

Two ideas: 1) protecting an existing trained model from leaking learnt sensitive information, and 2) preventing models from learning such information in the first place.

As many of the inference time attacks rely on the confidence scores of the target model, defence can be simply confidence masking (hiding confidence scores (e.g., providing only the final label or top-K confidence) or perturbing the confidence scores directly) or regularization. Ensembles have also been used against privacy attacks (e.g., switching ensembles (PASE) have been shown to work against MIAs).

4.2.1 Machine Unlearning / Forgetting

Removing the user data that needs to be forgotten from the training dataset and retraining the model from scratch is usually infeasible because of the computational cost.

Machine Unlearning [2]: given the need for a “forgetting system”, the first unlearning algorithm is based on converting learning algorithms into summation form (model weights are not trained on each data sample, but on a small number of sums of the data sample transforms) for efficiently forgetting data traces. (also works against data pollution attacks.)

The unlearnt model can leak information about the forgotten data under MIA, even when the original non-unlearnt model did not leak such information [3].

4.2.2 Knowledge Distillation

Distillation for Membership Privacy: train models by leveraging various sources of noise in the model distillation process.

4.2.3 Adversarial Defences

Adversarial defence strategies use potential attack models as a penalty when training the target model (in practice this has been mostly explored for MIAs).

Min-max game-based adversarial regularization [4]: regularize the target model during training, so that its predictions on training data are indistinguishable from that on other data points under the same distribution.
Memguard [5]: the first defence with formal utility loss guarantees against black-box MIAs. Adding noise to the confidence vectors of target model and turning them into adversarial examples (which makes MIA classifier vulnerable).

Some of the proposed adversarial defences, such as PGD adversarial training, on the contrary increase the model’s susceptibility to membership inference attacks.

Using some SOTA attacks as adversaries during training can weaken the defence against these and new attacks compared even to the original undefended model (proven by [6]).

4.2.4 Training with Differential Privacy (DP)

Idea: gather confidential user data for analysis without compromising the confidentiality of each individual user.

Definition [7]: algorithm K is considered to be ε-private if for all datasets D1 and D2 differing in at most one data entry and all events S

This can be interpreted as follows: a differentially private algorithm’s functionality should remain unchanged whether any single entry is or is not present in its training dataset.

"Pure" differential privacy: a guarantee on the maximum privacy loss -- the maximum divergence between these two distributions (or a maximum log odds ratio for any event S) is bounded by the "privacy budget" ε. (There exist generalizations and relaxations of DP methods for higher accuracy than “pure” DP.)

Two recent approaches to implement DP for NNs:

Private Aggregation of Teaching Ensembles (PATE): train an ensemble of teacher NNs on disjoint subsets of training dataset with strong privacy guarantees, then use student NN to aggregate the teacher’s knowledge in a noisy fashion (i.e., black-box querying the teacher ensemble and receiving the noisy labels), so that the student used for inference never sees the training data, and the teachers are never publicly shared.
Gradient Descent Perturbations: gradient clipping and adding noise in gradient update.

4.2.5 Federated / Collaborative Learning

FL trains an ML model on a central server, across multiple decentralized local databases without exchanging the data directly, potentially mitigating risks of direct data leakage.

(Also see [8].)

4.2.6 Operating on Encrypted Data

Traditional encryption requires sharing of the key amongst the parties, which interferes with individual privacy. However, Homomorphic Encryption (HE) techniques allow any third party to operate on the encrypted data without decrypting it in advance.

(The HE is a vast and well-established field, but it is not efficient and often impractical in the real world.)

5 Metrics

Measuring leakage is case-specific: depends on the type of data and leakage.

Assessing involuntary leakage: overfitting is easy to assess (via generalization error), while memorization and feature leakage are harder.
Assessing data leakage via attacks: can be straightforward (e.g., MIA, see details here).
Assessing data leakage for defence purposes: include Kolmogorov-Smirnov distance for verifying forgetting [9], metrics for machine unlearning leakage under MIAs [3], and estimating the Bayes risk of the system [10].
Learning metrics as a fairness constraint: e.g. hand-crafting loss at training time.
Metrics in differential privacy: Renyi Divergence to bound any arbitrary privacy loss [11]; Total Variation Distance [12] between noisy and original marginals of data distributions.

(Assessment of the data leakage in trained ML models remains an open area of research.)

(Most fair ML deals with learning fair classifiers. Most proposed methods are based on the definition of fairness tailored to their specific objective. *2022)

6 Challenges and Opportunities (2022)

Attacks are not evenly explored across different data types or tasks: MIAs are not well investigated for regression and segmentation; MEAs have not been verified for generative models; PIAs have only been applied to classification.
Replacing real personal data with synthetic data could be a promising direction of defences at the data level (while it remains vulnerable to PIAs and AIAs).
Defences via model often work only for specific settings: e.g., for MIAs, adversarial defences are mainly explored, but DP-based defences may not universally succeed.
Uniform mechanisms for reporting data leakage are lacking.

(Understanding of data leakage, its causes and implications, is unexplored.)

Selected References

[1] P. Chan, Z.-M. He, H. Li, and C.-C. Hsu, “Data sanitization against adversarial label contamination based on data complexity,” Int. J of Machine Learning and Cybernetics, vol. 9, pp. 1039–1052, 2018.

[2] Y. Cao and J. Yang, “Towards making systems forget with machine unlearning,” in IEEE SSP. IEEE, 2015.

[3] M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y. Zhang, “When machine unlearning jeopardizes privacy,” CoRR, 2020.

[4] M. Nasr, R. Shokri, and A. Houmansadr, “Machine learning with membership privacy using adversarial regularization,” in SIGSAC. ACM, 2018, pp. 634–646.

[5] J. Jia, A. Salem, M. Backes, Y. Zhang, and N. Z. Gong, “MemGuard: defending against black-box membership inference attacks via adversarial examples,” in SIGSAC, 2019, pp. 259–274.

[6] L. Song, R. Shokri, and P. Mittal, “Privacy risks of securing machine learning models against adversarial examples,” in SIGSAC. ACM, 2019.

[7] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, pp. 211–407, 2014.

[8] L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A survey,” CoRR, 2020.

[9] X. Liu and S. A. Tsaftaris, “Have you forgotten? A method to assess if machine learning models have forgotten data,” MICCAI, 2020.

[10] G. Cherubin, K. Chatzikokolakis, and C. Palamidessi, “F-BLEAU: fast black-box leakage estimation,” in IEEE SSP, 2019, pp. 835–852.

[11] B. Jayaraman and D. Evans, “Evaluating differentially private machine learning in practice,” in USENIX, 2019.

[12] A. B. Tsybakov, Introduction to Nonparametric Estimation. Springer, 2008.