# Reading Note: "Towards More Practical Threat Models in Artificial Intelligence Security"

## Intro

***Issue***: Threats studied in academia do <mark style="background-color:yellow;">not always reflect the practical</mark> use and security&#x20;risks of AI (i.e., while models are often studied in&#x20;isolation, they form part of larger <mark style="color:orange;">**ML pipelines**</mark> in practice).&#x20;

***Contribution***: Revisiting the <mark style="color:green;">**threat models of the six most studied attacks**</mark> in AI security research and matching them to AI usage in practice.

***Point of view***: Research is often <mark style="background-color:yellow;">too generous with the attacker</mark>, assuming access to information <mark style="color:orange;">**not frequently available**</mark> in real-world settings (e.g., large fractions of accessible training&#x20;data for poisoning and backdoor attacks, large query budgets for black-box evasion and model stealing).

## Threat Models in AI Security

***Threat model***: Three aspects defining an&#x20;attacker’s behavior - <mark style="color:blue;">knowledge</mark> (what the attacker knows or has&#x20;access to), <mark style="color:blue;">capabilities</mark> (what the attacker can&#x20;alter), and <mark style="color:blue;">goal</mark>.

***Three types of goals***:

* <mark style="color:blue;">Availability</mark>: Decreasing <mark style="color:orange;">**overall performance**</mark> to a degree where this system may not be usable.
* <mark style="color:blue;">Integrity</mark>: Preserving original performance, but <mark style="color:orange;">**specific inputs**</mark> may be processed incorrectly.
* <mark style="color:blue;">Confidentiality</mark>: Concerning intellectual property (<mark style="color:orange;">**IP**</mark>) of the  &#x20;model and <mark style="color:orange;">**secrecy**</mark> of the training data.

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FCBkH6UtfFaUrKfuKvjtd%2F000.png?alt=media&#x26;token=2f118ef4-7e19-4b8d-b223-26b9b17c07dc" alt=""><figcaption></figcaption></figure>

***1) Poisoning*** (altering training data or labels to <mark style="color:green;">**decrease accuracy**</mark>)

* *<mark style="color:blue;">Label-flip attacks</mark>*: Uniquely targeting <mark style="color:orange;">**labels**</mark>.
* *<mark style="color:blue;">Bi-level poisoning</mark>*: Based on the bi-level formulation, altering only <mark style="color:orange;">**samples**</mark> or <mark style="color:orange;">**samples+labels**</mark>.
* *<mark style="color:blue;">Sloth attacks</mark>*: Aiming at increasing the model’s <mark style="color:orange;">**runtime**</mark>.

***2) Backdoor*** (specifying input patterns to <mark style="color:green;">**trigger target outputs**</mark>)

* There are several ways to introduce backdoors via <mark style="color:orange;">**training or fine-tuning**</mark> data.

***3) Evasion/Adversarial Attack*** (decreasing <mark style="color:green;">**test-time accuracy**</mark> of the otherwise well-performing model)

* *<mark style="color:blue;">White-box attack</mark>*: Needing access to test data and knowledge about  &#x20;the model.
* *<mark style="color:blue;">Black-box attack</mark>*: Requiring only access  &#x20;to the model <mark style="color:orange;">**outputs**</mark> and the <mark style="color:orange;">**rough nature**</mark> of the data.
* *<mark style="color:blue;">Transfer attack (black-box)</mark>*: Computing the attack on one model and then transferring it to another.

***4) Model Stealing*** (copying <mark style="color:green;">**model functionality**</mark> through black-box access without consent of its owner)

* *<mark style="color:blue;">Model stealing attack</mark>*: By submitting specific <mark style="color:orange;">**test queries**</mark> or labeling data from the  &#x20;purpose task.
* *<mark style="color:blue;">Model extraction attack</mark>*: To deduce <mark style="color:orange;">**architectural**</mark> choices like the usage of dropout.

***5) Membership Inference*** (targeting the <mark style="color:green;">**privacy**</mark> of the used <mark style="color:green;">**training data**</mark>)

* *<mark style="color:blue;">Membership inference attack</mark>*: Predicting membership to training data for <mark style="color:orange;">**given samples**</mark> based on the  &#x20;target model’s output (rely on membership metrics, or <mark style="color:orange;">**shadow-models**</mark> trained on known membership outputs, or repeatedly querying).&#x20;
* *<mark style="color:blue;">Inversion attacks</mark>*: Regenerating training data based  &#x20;on <mark style="color:orange;">**generative models**</mark> trained with the target model’s outputs.

***6) Attribute Inference*** (targeting specific sensitive <mark style="color:green;">**attributes or features**</mark>)

* Assuming <mark style="color:orange;">**white-box**</mark> knowledge of  &#x20;the victim and using the weights for a <mark style="color:orange;">**meta-classifier**</mark>.

## Investigation Results

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FXyOwHQ6g3yPVV7YMEJEX%2F000.png?alt=media&#x26;token=85103401-1cf1-42ef-81f2-94474f9d47dd" alt="" width="384"><figcaption></figcaption></figure>

***Take Away*****&#x20;– Poisoning and Backdoor (training-time attack)**

* While data can often  &#x20;<mark style="background-color:yellow;">not be accessed directly</mark>, poisoning and backdooring  &#x20;may be executed via <mark style="color:orange;">**public data sources**</mark>.&#x20;
* There are frequent (about 50%) use of  &#x20;<mark style="color:orange;">**third-party models**</mark> which are then fine-tuned.

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FHDbSsszQa7PBFmUf8iiy%2F000.png?alt=media&#x26;token=0b82fe4b-69c7-4f73-8e94-45b48c38e979" alt="" width="385"><figcaption></figcaption></figure>

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FRzOm2Gx5ryVytoZGlcbw%2F000.png?alt=media&#x26;token=32f2587d-5ef3-4328-bbbb-344bf211977c" alt="" width="404"><figcaption></figcaption></figure>

***Take Away*****&#x20;– Evasion/Adversarial Attack (test-time attack)**

* <mark style="color:orange;">**Only 4.1%**</mark> of models are vulnerable to white-box evasion. Often, the model is not available.
* While research assumes a moderate  &#x20;query number, in practice <mark style="color:orange;">**either very few**</mark> <mark style="color:orange;">**or an unconstrained number**</mark> of queries  &#x20;is granted.
* In some cases, only data can  &#x20;be submitted <mark style="color:orange;">**without model feedback**</mark>, highlighting the  &#x20;need of <mark style="color:orange;">**transferability**</mark>.

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2F32ThFPcTfzTlQWuyqSjy%2F000.png?alt=media&#x26;token=682b5efc-eb1b-4c81-909b-ae33ddea7ee3" alt="" width="388"><figcaption></figcaption></figure>

***Take Away*****&#x20;– Model Stealing (test-time attack)**

* A relevant setting has <mark style="color:orange;">**only output visibility**</mark>, without the possibility of submitting test queries.
* Most attacks study infrequent numbers of queries, as either more or fewer samples are granted commonly.

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FGNdj7qhKEBWZVU1P6wHz%2F000.png?alt=media&#x26;token=b721044a-04e1-4050-bcf7-64e768a314a0" alt="" width="388"><figcaption></figcaption></figure>

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FV6OCHy7TcfNqaBGvo0vd%2F000.png?alt=media&#x26;token=7b420df3-6afc-480d-bcae-62463bbca1c3" alt="" width="387"><figcaption></figcaption></figure>

***Take Away*****&#x20;– Membership or Attribute Inference (privacy attack)**

* It would be beneficial to study membership attacks with <mark style="color:orange;">**only access to outputs**</mark>.

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FkEfYF9yO8tZOZIF688np%2F000.png?alt=media&#x26;token=3192da26-2686-46d8-939f-3035526d45ef" alt="" width="392"><figcaption></figcaption></figure>

## Beyond Specific Attacks

* Some datasets in practice are <mark style="color:orange;">**smaller in features**</mark> than current research datasets, outlining the need to also study data security for a <mark style="color:orange;">**few features**</mark> and <mark style="color:orange;">**many samples**</mark>.

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FTrgSRZiTDb1WULgVwGfq%2F000.png?alt=media&#x26;token=eca71e8c-c963-40ca-b6f2-b9b76da44cd4" alt=""><figcaption></figcaption></figure>

* For defense, there is a need to study constraints in terms of <mark style="color:orange;">**expert knowledge**</mark> and <mark style="color:orange;">**real-time responses**</mark>.
* <mark style="color:orange;">**Code libraries**</mark> can be security-relevant for AI.

## Future Directions

<figure><img src="https://725511345-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fo36vbcqVTETuSOVJefc3%2Fuploads%2FVn8QUDbOwGjuqE9cH4yz%2F000.png?alt=media&#x26;token=7b556722-de61-4f47-bdd5-12f26d246171" alt=""><figcaption></figcaption></figure>
