Adversarial ML: How AI is Enabling Cyber Resilience

Machine learning enables us to correctly classify a file as either benign or malicious over 99% of the time. But the question then becomes, how can this classifier be attacked? Is it possible to alter the file in such a way as to trick the classifier?

File Classification Via Machine Learning

We often make the mistake of assuming the model is judging as we judge, i.e., we assume the machine learning model has baked into it a conceptual understanding of the objects being classified.

For example, let’s look at lie detectors. What is a lie? From a human’s point of view, a lie is a statement believed to be false but offered as true. From the point of view of the lie detector, the triple (heart rate>threshold, perspiration>threshold, body movement>threshold) is a lie.  Consequently, the lie detector can be tricked as long as the subject is able to successfully control their biometrics. 

The upshot is that divergence between the model paradigm and the human paradigm creates opportunities for attackers to trick the machine learning model.

So what is machine learning? By machine learning, we simply mean the design and implementation of learning algorithms, where a learning algorithm is an algorithm that rather than being explicitly programmed, programs itself by way of optimization.

Let’s review the process for constructing a machine learning model:

In the image on the left, the arrow representswhich gives us the direction of steepest loss decrease. We train our model via loss minimization by adjusting our weight vector  in the direction of the negative loss gradient. The red line in the image on the right represents our decision boundary, which moves into the optimal position for separating red dots from blue dots as our loss is minimized on the left. 

Weaknesses of Machine Learning Classifiers

As is the case for any classification scheme, machine learning classifiers are susceptible to attack. Samples can be judiciously perturbed in such a way that the sample crosses the decision boundary while retaining its fundamental properties. In the case of a file classifier, this means the perturbed malicious sample’s control flow remains identical to that of the original sample, but the classifier will misclassify the given sample as benign. 

In mathematical terms, the perturbation takes the following form:

How These Weaknesses Can be Exploited to Make a Malicious File Look Benign

Now that we understand a basic methodology for tricking a classifier, let’s look at an analogous example. Consider Ian Goodfellow’s famous panda/gibbon example in which noise corresponding to the signed loss gradient is added to the original panda image to produce an image, which still looks like a panda to the human eye, but which tricks the classifier into thinking it’s an image of a gibbon:

What’s happening here is that the noise image is made up of perturbations of the image pixels given by the signed loss gradient with respect to the pixel values. In simpler terms, the noise is specially designed to increase the loss of the classifier, i.e., flip the label of the image assigned by the model, while not affecting the appearance of the image.

A Word on Snow Features

Researchers must also be wary of ‘snow features’. A snow feature is a feature which is not fundamental to the nature of the object being classified, but which aids in classification due to cooccurrence.

For example, if one wishes to train an image classifier which can distinguish between images of dogs and images of wolves, one must take care to note that most if not all of the wolf images will contain background snow. This concern arises due to the fact that the classifier may learn to focus on the snow pixels due to the high co-occurrence of wolves and snow, rather than learning more intrinsic wolf features, i.e., larger shoulders, thicker coats, etc.

This same phenomenon exists in within the realm of file classification in the form of features which often co-occur with malicious files, but which do not affect control flow. 

How to Harden Your Model Against Adversarial Attacks

Once you understand how adversarial attacks may happen, it becomes possible to develop strategies on how to defend your model. Four ways to do this are as follows:


Practical Applications – CAPTCHAs and the Turing Test

When we look at the practical applications of these mathematical concepts, some of the key areas have been in image recognition, specifically CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart). They were originally created for two purposes: prevent bots from adding URLs to Alta Vista’s search engine and to distinguish humans from computers.

At the time in the early 2000’s, it was very hard for a computer to read text. An incredible result of this effort was the digitization of millions of books and the NY Times. Since then, other types of CAPTCHAs have been created to continually make people prove they are actually humans.

The reason why the Turing Test is in the CAPTCHA name is because of Alan Turing, who was a British-born genius and one the godfathers of computer science and artificial intelligence. In addition to being a key figure in breaking the German enigma in WWII, he devised the Turing Test in 1950. The purpose of this was for an examiner to see if a human was distinguishable from a robot. In addition to his creations, Turing also predicted that by the year 2000, machines with only 100MB of storage would be able to pass this test. Although he was off on the storage size and date, he was correct in that computers would eventually be able to pass these tests.

Earlier in the article, we discussed how Deep Neural Networks are steps beyond linear models and include loss functions, which are a vital aspect of machine learning because of their purpose in determining how wrong we are when calculating the predicted output using a certain model. The reason why this is important is because it enables the neural network to become more accurate at classifying the input, or, what it sees. The type of neural network used for recognition is a convolution neural network. What it is doing is assigning each feature, or characteristic, of the object, a number, in the previously mentioned process called vectorization. Those get calculated through each layer until we get a number, such as “42” or “cat”.

The fact is that neural networks can be used for malicious purposes too, such as reading and then creating clickbait. Two researchers at ZeroFOX created SNAP_R, which scrapes what a Twitter user commonly clicks on then generates a tweet back to the user based on that information. It was even able to beat the humans at luring in more victims[1]. Laser phishing is another scenario where an individual’s writing style can be analyzed and mimicked in order to impersonate them and phish a victim known to them. And this means that CAPTCHAs are vulnerable to attack, as multiple researchers have proven in recent years.

In fact, one way is being done in the real world, for free, and is completely approved by Google. Buster: Captcha Solver for Humans is a Chrome extension that uses the Google Cloud in order to pass the Google Image CAPTCHAs. It is able to do this because Google Image CAPTCHAs have an audio option for blind people. It completely bypasses the image recognition requirement and sends the audio to Google’s Speech to Text API, which then automatically fills in the CAPTCHA text field and verifies you are a human. This is the state of security today where you can use a vendor’s own technology to bypass their own security. For the truly lazy or non-technical folks, you can also go online to any search engine and find CAPTCHA-breaking-as-a-Service companies that operate for as low as one one-hundredth of a nickel per CAPTCHA[2]. You can even rate the CAPTCHA breaker, just like you would rate your Lyft driver.

This is where adversarial protection using noise, like in the Panda example, can come into play. As mentioned before, tiny perturbations in the pixels of an image are enough to trick the classifier. At this time, this Fast Gradient Step Method is the simplest and most efficient method to carry out this task. It works because it helps find the minimal perturbation to maximize the error. The noise image is also only amplified with color pixels for presentation purposes, otherwise it would only be a grayscale box. While considered an adversarial technique, this can enable resilience as well in order to fool attacks who would try and bypass an image using a bot to correctly identify the image using their own classifier. While it looks imperceptibly identical to the human eye, it is completely different digitally. This concept of adding noise to an output can also be applied wherever someone is trying to bypass a protection mechanism. As mentioned before, one possible defense is including adversarial examples in your training set, so your model has awareness of these types of examples.

Key Takeaways

After all of this light reading, what have we learned about adversarial machine learning?

1)     When focusing on usage, tools can be easily available and cheap.
2)     Machine learning will have a role in the arms race
3)     The next frontier for machine learning will be solving identity and authentication challenges. In fact, it is already begun.

Citations:

[1] https://www.forbes.com/sites/thomasbrewster/2016/07/25/artificial-intelligence-phishing-twitter-bots/#2e41b23876e6
[2] https://chrome.google.com/webstore/detail/buster-captcha-solver-for/mpbjkejclgfgadiemmefgebjfooflfhl?hl=en

 

DISCLAIMER: This blog represents the opinions of the authors only, and does not represent an official BlackBerry Cylance endorsement of any companies, services or products mentioned herein.