Poisoning Attacks and Defenses on Artificial Intelligence: A Survey

Introduction

Machine learning models are susceptible to a variety of attacks during both training and inference.

This research paper focuses on attacks during training, particularly the data poisoning attac, and several defenses against it.

This is an attack that “poisons” data during the training phase and therefore reduces the model’s accuracy during inference.

There are two types of attacks:

Techniques

The damage done by these techniques will show up during model inference where the accuracy of the model will be significantly reduced.

Some forms of manipulating training data include:

There are two goals for adversarial attacks:

A targeted attack where adversarial examples are included in the training data that causes the model to misclassify the data into a specific class
A non-targeted attack where the adversarial examples causes the model to misclassify in general and not to a specific class

Note: Evasion attacks are another type of training data manipulation, but they do not require knowledge of training data

Attacks on machine learning models falls into two categories:

Citations:

[1] https://arxiv.org/pdf/2202.10276.pdf