Poisoning Attacks and Defenses on Artificial Intelligence: A Survey
Introduction
Machine learning models are susceptible to a variety of attacks during both training and inference.
This research paper focuses on attacks during training, particularly the data poisoning attac, and several defenses against it.
This is an attack that “poisons” data during the training phase and therefore reduces the model’s accuracy during inference.
There are two types of attacks:
- Attacks on Non-Neural Networks
- Attacks on Neural Networks
Knowledge Background
Manipulations
Techniques
- Modification of data labels
- Injections of malicious samples
- Manipulation of the training data
The damage done by these techniques will show up during model inference where the accuracy of the model will be significantly reduced.
Some forms of manipulating training data include:
- Altering images without changing the label
- Adding inputs that causes a classifier to learn to make the wrong predictions
There are two goals for adversarial attacks:
- A targeted attack where adversarial examples are included in the training data that causes the model to misclassify the data into a specific class
- A non-targeted attack where the adversarial examples causes the model to misclassify in general and not to a specific class
Note: Evasion attacks are another type of training data manipulation, but they do not require knowledge of training data
Assumptions of Attack and Defense
Attacks on machine learning models falls into two categories:
- Data poisoning (DP) attacks, which are applied during training
-
Adversarial attacks, which are applied during testing
[^1]
Citations:
[1] https://arxiv.org/pdf/2202.10276.pdf