Adversarial Attacks And Defenses And The Dimpled Manifold Hypothesis

Adversarial Examples are data that look normal for humans but cause neural networks to think it is something else.

Adversarial Defenses are how neural networks learn to be robust to these attacks.

It is harder to publish for adversarial defenses for a variety of reasons:

It’s easier to come up with counterexamples to break a proof than prove that a proof actually works in a general sense.
An attack only needs one counter-example but an adversarial defense needs to be robust against all possible adversarial attacks.

Why do adversarial examples exist?

Hypothesis:

Dimple Manifold Hypothesis.

Some features correlate with your target.

In computer vision, if you always see birds near to water, the model picks up the water.

Citations:

[1] https://www.youtube.com/watch?v=9bJcfk3HdLY&list=LL&index=1&t=309s