๐Reading Note: "Survey: Leakage and Privacy at Inference Time"
Jegorova, Marija, et al. "Survey: Leakage and privacy at inference time." IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
Last updated
Jegorova, Marija, et al. "Survey: Leakage and privacy at inference time." IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
Last updated
Leakage of data from publicly available ML models is an area of growing significance since it can draw on multiple sources of data, potentially including usersโ sensitive data.
Inference-time leakage: the most likely scenario for publicly available models.
Topics:
What leakage is in the context of different data, tasks, and model architectures;
Taxonomy across involuntary and malicious leakage (i.e., involuntary data leakage which is natural to ML models, and potential malicious leakage caused by privacy attacks);
Current defence mechanisms, assessment metrics, and applications.
Key Words: Data Leakage, Privacy Attacks and Defences, Inference-Time Attacks
What is personal (private) and sensitive data?
Personal data: in loose terms, relates to an identified or identifiable natural person.
Sensitive data: the personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric and health-related data; data concerning personโs sexual orientation.
Leakage for different data types: Types of data leakage are largely data-specific (e.g., text data, images, and tabular data).
Leakage for different tasks: Privacy violations and their mitigations are also task- and model-specific (e.g., classification, regression/prediction, generation/synthesis, and segmentation).
(also see Table 1, p3 of the original paper)
How do user actions affect leakage?
Passive / honest-but-curious user: interacts with the trained model as intended by design and in compliance with protocols (only involuntary / benign leakage if there is).
Malevolent user / an adversary: attempts to take advantage of potential vulnerabilities in the trained model to extract sensitive data (privacy attacks).
Ways in which data leak without malicious user intervention include overfitting and memorization.
Memorization:
Memorization of specific training data samples occurs when the model assigns some sample a significantly higher likelihood than expected by random chance (potential risks: membership inference attacks, sensitive attribute, and training dataset reconstruction).
Data augmentation reduces (but does not eliminate) the memorization capacity of a network, whereas increasing the size of the architecture increases its memorization capacity.
Feature Leakage: a special case of memorization, sensitive attributes/features are unintentionally memorized and revealed by the trained model at inference time (enables property inference attacks).
MIAs (also sometimes called "linkage attacks") and reconstruction MIAs can be used to identify the individual records used for training open-access ML models.
Attribute inference attack (or reconstruction attack): access to the trained ML model and incomplete information about a data point, to infer the missing information about that point.
(For more details, also see another reading note here.)
The end-goal of MEA is to steal the trained model functionality:
Steal the model parameters, given the model architecture (or at least the type);
Steal the entire model architecture when it is unknown;
Steal the model functionality (not necessarily reverse engineer the target model itself).
(Model extraction for generative models remains unexplored. There is no single reliable way to verify how much of the training data is memorized by a GAN. *2022)
A white-box attack to extract a specific sensitive attribute or feature of interest from a given target model. The goal is to build a meta-classifier, capable of telling whether a model contains an attribute (principle: similar models trained on similar datasets can exhibit similar properties).
To train the meta-classifier, an attacker trains a series of shadow classifiers on some datasets, where only some of the subsets exhibit the target property. The shadow models are not explicitly trained to learn the property, but learn it as a consequence of the bias introduced in the dataset. The weights and biases of models are often used as features in the training of the meta-classifier.
Poisoning Attacks: a special case of PIAs, polluting the data or model during the training, to cause a bias in the output, resulting in a leakage. (PIAs are so far somewhat limited to fully connected neural networks and classification tasks. *2022)
Partial reconstruction of private datasets from aggregated publicly available information including open-access or query-only trained ML models.
Current defence methods can largely be dichotomized as applying augmentations at the training data level versus training, tuning, and designing models with inbuilt defence mechanisms โ at the model level.
Simply deleting sensitive features / entries can violate the data integrity and consistency and represent a privacy risk of its own, since the pattern of โmissingnessโ might reveal some data properties. Hence data obfuscation and sanitization are often applied to mask, scramble, or overwrite sensitive information with a realistic fake.
Data Obfuscation: perturb the sensitive information in the data through scrambling or masking.
Data Sanitization: disguise the sensitive information within the data by overwriting it with realistic-looking synthetic data (e.g., flipping labels, adding noise, and randomization). (Data sanitization is often a natural precaution for adversarial attacks [1].)
Learning with Synthetic Data: can be viewed as a natural extension to both data obfuscation and sanitization. (So far synthetic data generation with privacy guarantees remains elusive (largely because generative models capture the underlying training data distribution, and might leak properties of the dataset into its generated data, enabling PIAs). *2022)
Two ideas: 1) protecting an existing trained model from leaking learnt sensitive information, and 2) preventing models from learning such information in the first place.
As many of the inference time attacks rely on the confidence scores of the target model, defence can be simply confidence masking (hiding confidence scores (e.g., providing only the final label or top-K confidence) or perturbing the confidence scores directly) or regularization. Ensembles have also been used against privacy attacks (e.g., switching ensembles (PASE) have been shown to work against MIAs).
Removing the user data that needs to be forgotten from the training dataset and retraining the model from scratch is usually infeasible because of the computational cost.
Machine Unlearning [2]: given the need for a โforgetting systemโ, the first unlearning algorithm is based on converting learning algorithms into summation form (model weights are not trained on each data sample, but on a small number of sums of the data sample transforms) for efficiently forgetting data traces. (also works against data pollution attacks.)
The unlearnt model can leak information about the forgotten data under MIA, even when the original non-unlearnt model did not leak such information [3].
Distillation for Membership Privacy: train models by leveraging various sources of noise in the model distillation process.
Adversarial defence strategies use potential attack models as a penalty when training the target model (in practice this has been mostly explored for MIAs).
Min-max game-based adversarial regularization [4]: regularize the target model during training, so that its predictions on training data are indistinguishable from that on other data points under the same distribution.
Memguard [5]: the first defence with formal utility loss guarantees against black-box MIAs. Adding noise to the confidence vectors of target model and turning them into adversarial examples (which makes MIA classifier vulnerable).
Some of the proposed adversarial defences, such as PGD adversarial training, on the contrary increase the modelโs susceptibility to membership inference attacks.
Using some SOTA attacks as adversaries during training can weaken the defence against these and new attacks compared even to the original undefended model (proven by [6]).
Idea: gather confidential user data for analysis without compromising the confidentiality of each individual user.
Definition [7]: algorithm K is considered to be ฮต-private if for all datasets D1 and D2 differing in at most one data entry and all events S
This can be interpreted as follows: a differentially private algorithmโs functionality should remain unchanged whether any single entry is or is not present in its training dataset.
"Pure" differential privacy: a guarantee on the maximum privacy loss -- the maximum divergence between these two distributions (or a maximum log odds ratio for any event S) is bounded by the "privacy budget" ฮต. (There exist generalizations and relaxations of DP methods for higher accuracy than โpureโ DP.)
Two recent approaches to implement DP for NNs:
Private Aggregation of Teaching Ensembles (PATE): train an ensemble of teacher NNs on disjoint subsets of training dataset with strong privacy guarantees, then use student NN to aggregate the teacherโs knowledge in a noisy fashion (i.e., black-box querying the teacher ensemble and receiving the noisy labels), so that the student used for inference never sees the training data, and the teachers are never publicly shared.
Gradient Descent Perturbations: gradient clipping and adding noise in gradient update.
FL trains an ML model on a central server, across multiple decentralized local databases without exchanging the data directly, potentially mitigating risks of direct data leakage.
(Also see [8].)
Traditional encryption requires sharing of the key amongst the parties, which interferes with individual privacy. However, Homomorphic Encryption (HE) techniques allow any third party to operate on the encrypted data without decrypting it in advance.
(The HE is a vast and well-established field, but it is not efficient and often impractical in the real world.)
Measuring leakage is case-specific: depends on the type of data and leakage.
Assessing involuntary leakage: overfitting is easy to assess (via generalization error), while memorization and feature leakage are harder.
Assessing data leakage via attacks: can be straightforward (e.g., MIA, see details here).
Assessing data leakage for defence purposes: include Kolmogorov-Smirnov distance for verifying forgetting [9], metrics for machine unlearning leakage under MIAs [3], and estimating the Bayes risk of the system [10].
Learning metrics as a fairness constraint: e.g. hand-crafting loss at training time.
Metrics in differential privacy: Renyi Divergence to bound any arbitrary privacy loss [11]; Total Variation Distance [12] between noisy and original marginals of data distributions.
(Assessment of the data leakage in trained ML models remains an open area of research.)
(Most fair ML deals with learning fair classifiers. Most proposed methods are based on the definition of fairness tailored to their specific objective. *2022)
Attacks are not evenly explored across different data types or tasks: MIAs are not well investigated for regression and segmentation; MEAs have not been verified for generative models; PIAs have only been applied to classification.
Replacing real personal data with synthetic data could be a promising direction of defences at the data level (while it remains vulnerable to PIAs and AIAs).
Defences via model often work only for specific settings: e.g., for MIAs, adversarial defences are mainly explored, but DP-based defences may not universally succeed.
Uniform mechanisms for reporting data leakage are lacking.
(Understanding of data leakage, its causes and implications, is unexplored.)
[1] P. Chan, Z.-M. He, H. Li, and C.-C. Hsu, โData sanitization against adversarial label contamination based on data complexity,โ Int. J of Machine Learning and Cybernetics, vol. 9, pp. 1039โ1052, 2018.
[2] Y. Cao and J. Yang, โTowards making systems forget with machine unlearning,โ in IEEE SSP. IEEE, 2015.
[3] M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, and Y. Zhang, โWhen machine unlearning jeopardizes privacy,โ CoRR, 2020.
[4] M. Nasr, R. Shokri, and A. Houmansadr, โMachine learning with membership privacy using adversarial regularization,โ in SIGSAC. ACM, 2018, pp. 634โ646.
[5] J. Jia, A. Salem, M. Backes, Y. Zhang, and N. Z. Gong, โMemGuard: defending against black-box membership inference attacks via adversarial examples,โ in SIGSAC, 2019, pp. 259โ274.
[6] L. Song, R. Shokri, and P. Mittal, โPrivacy risks of securing machine learning models against adversarial examples,โ in SIGSAC. ACM, 2019.
[7] C. Dwork and A. Roth, โThe algorithmic foundations of differential privacy,โ Found. Trends Theor. Comput. Sci., vol. 9, pp. 211โ407, 2014.
[8] L. Lyu, H. Yu, and Q. Yang, โThreats to federated learning: A survey,โ CoRR, 2020.
[9] X. Liu and S. A. Tsaftaris, โHave you forgotten? A method to assess if machine learning models have forgotten data,โ MICCAI, 2020.
[10] G. Cherubin, K. Chatzikokolakis, and C. Palamidessi, โF-BLEAU: fast black-box leakage estimation,โ in IEEE SSP, 2019, pp. 835โ852.
[11] B. Jayaraman and D. Evans, โEvaluating differentially private machine learning in practice,โ in USENIX, 2019.
[12] A. B. Tsybakov, Introduction to Nonparametric Estimation. Springer, 2008.