📄Reading Note: "Membership Inference Attacks on Machine Learning: A Survey"
Hu, Hongsheng, et al. "Membership inference attacks on machine learning: A survey." ACM Computing Surveys (CSUR) 54.11s (2022): 1-37.
Abstract
Membership Inference Attack (MIA): to infer whether a data record was used to train a target model or not.
MIAs on ML (classification, generative, etc.) models can directly lead to a privacy breach. *infer privacy information from training data
e.g. via identifying the fact that a clinical record that has been used to train a model associated with a certain disease, an attacker can infer that the owner of the clinical record has the disease with a high chance.
Key Words: privacy protection, MIA, differential privacy
1 Introduction
ML models should not leak individuals’ private information contained in the training data (e.g. user speech, images, and medical records).
ML models are prone to memorizing information of training data (as they are often overparameterized), making them vulnerable to several privacy attacks:
For model itself
Model extraction attacks: to duplicate the functionality of an ML model.
For private information of training data
Attribute inference attacks (or model inversion attacks): to infer sensitive attributes of a target data record given the output of a model and the information about the non-sensitive attributes.
Property inference attacks: to infer the global property of the training dataset (e.g. for a malware classification dataset including execution traces of malicious and benign software, infer the property of the testing environment).
Membership inference attacks
History of MIA
The concept of MIAs is firstly proposed by Homer et al. in [1] where an attacker leverages the published statistics about a genomics dataset to infer the presence of a particular genome in this dataset.
Shokri et al. [2] proposed the first MIAs on several classification models in the context of ML. An attacker can identify whether a data record was used in training, solely based on the prediction vector of the data record (i.e. black-box access)
Then there have been MIAs on various ML models, including regression models [3], generation models [4], and embedding models [5].
A recent report [193] published by the National Institute of Standards and Technology (NIST) specifically mentions that an MIA determining an individual was included in the dataset used to train the target model is a confidentiality violation.
Membership inference defenses: to defend against MIAs while preserving the utility of the target ML models.
2 MIA on ML Models
2.1 Adversarial Knowledge:
Knowledge of training data: the distribution of training data, is assumed to be available to the attacker in most settings of MIAs, which means the attacker can obtain a shadow dataset containing data records from the same data distribution as the training records (shadow dataset can be obtained by model-based synthesis when the data distribution is unknown). (To conduct a non-trivial MIA, it is often assumed that the shadow and training dataset are disjoint.)
Knowledge of the target model: how the target model is trained (i.e., the learning algorithm), the model architecture, and the learned parameters.
2.2 MIA Approaches
ML models exhibit a different behavior on training data records (i.e., members) versus test data records (i.e., non-members), and also in the model’s parameters which store statistically correlated information about specific data records in their training dataset. (e.g., classify a training data record to its true class with a high confidence while a test record with a relatively small confidence.)
These different behaviors of ML models enable an attacker of MIAs to build attack models to distinguish members from non-members of the training dataset. Based on the construction of the attack model, there are two major types of MIA approaches.
2.2.1 Binary Classifier Based MIA
Training a binary classifier to distinguish a target model’s behavior of its training members from the non-members.
Shadow training [2] (first and most widely used): create multiple shadow models to mimic the behavior of the target model (assume to know its structure and the learning algorithm), and construct a dataset (samples: prediction vectors, from the shadow models on their training sets and test sets) containing features and ground truth of membership of the training and test data records. Finally, train the binary classifier-based attack model based on the constructed dataset.
Black / White-box setting:
Black-box setting (only prediction vector)
White-box setting (concat: gradient of the loss with respect to parameters of each layer + intermediate computations at hidden layers + prediction vector + the loss)
2.2.2 Metric Based MIA
First calculate metrics on the prediction vectors of the data record, and then compare the metrics with a preset threshold to decide the membership status of the record (more simple and less computational). Based on different metric options, there are four major types of metric based MIAs:
Prediction correctness based MIA: infer an input record as a member if it is correctly predicted by the target model, otherwise infer it as a non-member.
Prediction loss based MIA: infer an input record as a member if its prediction loss is smaller than the average loss of all training members, otherwise infer it as a non-member.
Prediction confidence based MIA: infer an input record as a member if its maximum prediction confidence is larger than a preset threshold, otherwise infer it as a non-member.
Prediction entropy based MIA: infer an input record as a member if its prediction entropy is smaller than a preset threshold, otherwise infer it as a non-member (the target model usually has a larger prediction entropy on its test data than training data).
3 Why MIA Work
Overfitting of Target Models
Overfitting of the target ML models (usually because of high model complexity or limited size of training dataset) is the main factor contributing to the success of MIAs.
DL models are often overparameterized, having an unnecessarily high capacity of memorizing noise or details of the training dataset.
ML models are trained using many epochs on the same instances repeatedly, rendering the training instances very prone to being memorized by the models.
A training dataset with a finite size often fails to represent the whole data distribution, which makes the ML model difficult to generalize to test data, behaving very differently on training members and non-members (Theorem [6]: training acc - test acc (i.e. generalization gap) > 0 -> expected attack success rate > 50% (i.e. randomly guessing)).
Types of Target Models
A target model whose decision boundary is unlikely to be drastically impacted by a particular data record will be more resilient to MIAs. (experimental verification: [7])
Diversity of Training Data
If the training data is more representative (i.e., the training data can better represent the whole data distribution), the target model will be less vulnerable to MIAs.
4 Membership Inference Defense on ML Models
The existing defenses against MIAs fall into four main categories:
1) Confidence Score Masking
Hiding the true confidence scores returned by the target classifier to mitigate the effectiveness of MIAs (mainly for black-box MIAs on classification models). There are three methods belonging to this category (only implemented on the prediction vectors, so do not need to retrain the target model and influence its accuracy):
Provide top-k confidence scores instead of a complete prediction vector (almost do not work [2]).
Only provides the prediction label (works but not enough, as there are still label-only attacks).
Add crafted noise to the prediction vector to hide the true confidence scores (MemGuard [8]: add a crafted noise vector to the prediction vector to turn it into an adversarial example of the attack model (can mitigate the black-box DNN based attack to a random guess level, but the defended model can still be vulnerable to metric based attacks [9])).
2) Regularization
Reducing the overfitting degree of target models to mitigate MIAs (for classification and generation models).
Classical regularization methods: L2-norm regularization, dropout, data argumentation, model stacking, early stopping, and label smoothing.
Specially designed techniques (add new regularization terms to the objective function to force classifier to generate similar output distributions for training members and non-members): adversarial regularization [10] and Mixup + MMD (Maximum Mean Discrepancy) [11].
Drawback: might not be able to provide satisfactory membership privacy-utility tradeoffs.
3) Knowledge Distillation
Knowledge distillation uses the outputs of a large teacher model to train a smaller student model, in order to transfer knowledge from the large model to the small one.
Distillation For Membership Privacy (DMP) defense method [12]
complementary knowledge distillation (CKD) and pseudo CKD (PCKD) [13]
Intuition: restrict the private classifier’s direct access to the private training dataset.
4) Differential Privacy
Differential privacy (DP) is a probabilistic privacy mechanism that provides an information-theoretical privacy guarantee. When an ML model is trained in a differentially private manner, the learned model does not learn or remember any specific user’s details if the privacy budget is sufficiently small (provides a theoretical guarantee to protect the membership privacy).
Although it is widely applicable (also for other forms of privacy attacks) and effective, it rarely offers acceptable utility-privacy tradeoffs with guarantees for complex learning tasks.
5 Common Metrics
General Metric:
Accuracy
Generalization Error: the absolute difference between train acc and test acc (the larger value -> the more overfitted -> the higher privacy risks).
Adversarial Metric:
Attack Success Rate (ASR)
Attack Precision (AP): the fraction of records classified as members that are indeed members of the training dataset.
Attack Recall (AR): the fraction of the training dataset’s members that are correctly classified as members.
Attack False Positive Rate (FPR): the fraction of the testing dataset’s records that are misclassified as members.
Membership Advantage (MA): the difference between AR and FPR (MA = AR - FPR).
Attack F1-score: the harmonic mean of AP and AR (F1-score = 2 * AP * AR / (AP + AR)).
Attack AUC: the Area-under-the-ROC-curve, which is sensitive to the probability rank of members (larger when members are ranked higher than non-members).
6 Future Directions (at 2021)
For attack:
The assumption that the target ML models are heavily overfitted to their training data underpins the success of most existing works of MIAs, while the feasibility of MIAs on non-overfitted models still remains unknown. It is even difficult for an attacker to tell if a target ML model is overfitted or not.
MIAs on self-supervised learning models (e.g., Bert) have not been explored yet. Their training datasets consist of large unlabeled data, which can still be highly private and unauthorized.
While adversarial ML (e.g., data poisoning attacks and model evasion attacks) and MIAs as two separate research areas, it is interesting and challenging to understand their relationships (initial works: [14], [15]). One possible research question is how the attacker creates MIAs by leveraging the techniques of white-box attacks (e.g., FGSM and PGD).
There are more avenues having particular training paradigms that are pretty different from conventional supervised learning (e.g., contrastive learning and meta-learning), which therefore impose unique challenges on MIAs.
For federated learning, existing MIAs are limited to homogeneous federated learning where each local party is assumed to have the same model architecture, which is too strong for real applications (e.g., there is also heterogeneous federated learning).
Investigating new applications by exploiting the information gained from MIAs (e.g., as training data are more prone to evasion attacks in the context of adversarial learning, the membership information from MIAs can also be leveraged to improve the design of adversarial examples).
For defense:
The level of overfitting can be leveraged to measure the effectiveness of a membership inference defense method, but it is still a challenge to capture the overfitting phenomenon, especially for unsupervised learning (e.g., no defense has been proposed to mitigate MIAs on word embedding models).
To explore the possibility of leveraging the generated examples from generative models (also, data augmentation) as surrogate data for model training so as to mitigate MIAs.
The defense methods offering strong privacy guarantees often come at the cost of high utility loss of the target model, so it is challenging to design the defense with an acceptable trade-off between membership privacy and model utility.
Selected References
[1] Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLOS Genetics 4 (2008), 1–9.
[2] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (S&P). IEEE, 3–18.
[3] Umang Gupta, Dimitris Stripelis, Pradeep K Lam, Paul Thompson, Jose Luis Ambite, and Greg Ver Steeg. 2021. Membership Inference Attacks on Deep Regression Models for Neuroimaging. In International Conference on Medical Imaging with Deep Learning, Vol. 143. PMLR, 228–251.
[4] Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. 2019. Logan: Membership inference attacks against generative models. 2019, 1 (2019), 133–152.
[5] Congzheng Song and Ananth Raghunathan. 2020. Information leakage in embedding models. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. ACM, 377–390.
[6] Jason W Bentley, Daniel Gibney, Gary Hoppenworth, and Sumit Kumar Jha. 2020. Quantifying Membership Inference Vulnerability via Generalization Gap and Other Model Metrics. arXiv preprint arXiv:2009.05669 (2020).
[7] Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, and Wenqi Wei. 2019. Demystifying Membership Inference Attacks in Machine Learning as a Service. IEEE Transactions on Services Computing 01 (2019), 1–1.
[8] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016).
[9] Liwei Song and Prateek Mittal. 2021. Systematic evaluation of privacy risks of machine learning models. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2615–2632.
[10] Milad Nasr, Reza Shokri, and Amir Houmansadr. 2018. Machine learning with membership privacy using adversarial regularization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 634–646.
[11] Jiacheng Li, Ninghui Li, and Bruno Ribeiro. 2021. Membership Inference Attacks and Defenses in Classification Models. In Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy. ACM, 5–16.
[12] Virat Shejwalkar and Amir Houmansadr. 2021. Membership Privacy for Machine Learning Models Through Knowledge Transfer. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 9549–9557.
[13] Junxiang Zheng, Yongzhi Cao, and Hanpin Wang. 2021. Resisting membership inference attacks through knowledge distillation. Neurocomputing 452 (2021), 114–126.
[14] Christopher A Choquette-Choo, Florian Tramer, Nicholas Carlini, and Nicolas Papernot. 2021. Label-only membership inference attacks. In International Conference on Machine Learning. PMLR, 1964–1974.
[15] Zheng Li and Yang Zhang. 2021. Membership Leakage in Label-Only Exposures. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. ACM.
Last updated