Reading Note: "Towards More Practical Threat Models in Artificial Intelligence Security"

Grosse et al. "Towards More Practical Threat Models in Artificial Intelligence Security". In USENIX Security, 2024.

Intro

Issue: Threats studied in academia do not always reflect the practical use and security risks of AI (i.e., while models are often studied in isolation, they form part of larger ML pipelines in practice).

Contribution: Revisiting the threat models of the six most studied attacks in AI security research and matching them to AI usage in practice.

Point of view: Research is often too generous with the attacker, assuming access to information not frequently available in real-world settings (e.g., large fractions of accessible training data for poisoning and backdoor attacks, large query budgets for black-box evasion and model stealing).

Threat Models in AI Security

Threat model: Three aspects defining an attacker’s behavior - knowledge (what the attacker knows or has access to), capabilities (what the attacker can alter), and goal.

Three types of goals:

Availability: Decreasing overall performance to a degree where this system may not be usable.
Integrity: Preserving original performance, but specific inputs may be processed incorrectly.
Confidentiality: Concerning intellectual property (IP) of the model and secrecy of the training data.

1) Poisoning (altering training data or labels to decrease accuracy)

Label-flip attacks: Uniquely targeting labels.
Bi-level poisoning: Based on the bi-level formulation, altering only samples or samples+labels.
Sloth attacks: Aiming at increasing the model’s runtime.

2) Backdoor (specifying input patterns to trigger target outputs)

There are several ways to introduce backdoors via training or fine-tuning data.

3) Evasion/Adversarial Attack (decreasing test-time accuracy of the otherwise well-performing model)

White-box attack: Needing access to test data and knowledge about the model.
Black-box attack: Requiring only access to the model outputs and the rough nature of the data.
Transfer attack (black-box): Computing the attack on one model and then transferring it to another.

4) Model Stealing (copying model functionality through black-box access without consent of its owner)

Model stealing attack: By submitting specific test queries or labeling data from the purpose task.
Model extraction attack: To deduce architectural choices like the usage of dropout.

5) Membership Inference (targeting the privacy of the used training data)

Membership inference attack: Predicting membership to training data for given samples based on the target model’s output (rely on membership metrics, or shadow-models trained on known membership outputs, or repeatedly querying).
Inversion attacks: Regenerating training data based on generative models trained with the target model’s outputs.

6) Attribute Inference (targeting specific sensitive attributes or features)

Assuming white-box knowledge of the victim and using the weights for a meta-classifier.

Investigation Results

Take Away – Poisoning and Backdoor (training-time attack)

While data can often not be accessed directly, poisoning and backdooring may be executed via public data sources.
There are frequent (about 50%) use of third-party models which are then fine-tuned.

Take Away – Evasion/Adversarial Attack (test-time attack)

Only 4.1% of models are vulnerable to white-box evasion. Often, the model is not available.
While research assumes a moderate query number, in practice either very few or an unconstrained number of queries is granted.
In some cases, only data can be submitted without model feedback, highlighting the need of transferability.

Take Away – Model Stealing (test-time attack)

A relevant setting has only output visibility, without the possibility of submitting test queries.
Most attacks study infrequent numbers of queries, as either more or fewer samples are granted commonly.

Take Away – Membership or Attribute Inference (privacy attack)

It would be beneficial to study membership attacks with only access to outputs.

Beyond Specific Attacks

Some datasets in practice are smaller in features than current research datasets, outlining the need to also study data security for a few features and many samples.

For defense, there is a need to study constraints in terms of expert knowledge and real-time responses.
Code libraries can be security-relevant for AI.

Future Directions

PreviousReading Note: "Safety at Scale: A Comprehensive Survey of Large Model Safety"NextReading Note: "A Survey on Neural Speech Synthesis"

Last updated 3 months ago

Was this helpful?