pi Day: Machine Learning a la Mode

Rational Thoughts Around Irrational Behavior

"Did I ever tell you what the definition of insanity is? Insanity, is doing the exact same [expletive] thing over and over again, expecting [expletive] to change. That is crazy." - Vaas Montenegro,  Far Cry 3

Pretty rational thought from an irrational person, right? What does this have to do with pi? Well, pi is an irrational and transcendental number; it will continue infinitely without repetition or pattern. 

Wait, it Doesn't Repeat Itself; That Makes it Not Crazy – Right?

Correct! But, to add more context, put pi in the context of a DAT file.

What's a DAT File?

Put plainly, a DAT file is a data file created by a specific application. In the context of this article/security conversation, a DAT file is commonly used by legacy antivirus (AV) products to store signatures for malicious hashes; basically, it's a list.

Well, that seems rational. You need a place to keep a list of malicious things. 

Sure, when you hear/read it, that makes sense. A terrible malware attack happens and then the legacy product adds that malware hash (a hash is a long number derived from a string of text) to its list to prevent future attacks. The legacy product then pushes their “known attacks” list to your endpoints, which triggers the program to scan your systems for the hashes on that list.

From there, it’s rinse and repeat. Day after day, week after week, and so on and so on. Make no mistake, these attacks are then blocked, protecting your machines. But here’s the rub:

According to the AV-Test Institute, over 250,000 new malicious programs are registered each and every day. Every twenty four hours, a quarter of a million new malware variants have to be added to that list or examined to create a generic signature for groupings on that list. Day after day, week after week, and so on and so on. Indefinitely.  

The list, in theory, becomes neverending (see how I brought this back, full circle, towards pi?). Otherwise, an older malicious file could end up on the system and wreak havoc. 

This whole concept is irrational. Exponentially growing daily lists that create neverending regular updates and scans for threats that have a hash life of fewer than 58 seconds.

What this means is that due to the lag between the malware discovery and the signature update, legacy AV solutions simply can’t keep pace with all the new, emerging threats. They also offer no protection against new malicious files which haven’t yet been reported by Patient Zero/sacrificial lamb customers, and will thus continue to let these new malware varients into your systems until a signature update has been created, tested, released, and installed by the end user.     

Why bother keeping up with such a list, knowing that it will be a neverending task to keep it updated and that the data will already be out of date by the time customers of these signature-based AV soloutions finally install the updates?

"Then I started to see it everywhere I looked. Everywhere I looked all these [expletive] [people]. Everywhere I looked. Doing the exact same [expletive] thing, over and over and over and over again. Thinking “This time, it’s gonna be different.”  - Vaas Montenegro, Far Cry 3 

Listen, there is a rational way to handle this problem, and it leverages artificial intelligence (AI) and more specifically the most influential branch of AI: machine learning (ML). 

How Do You Figure? 

Have you ever heard the phrase, "history repeats itself"? That may not be universally true, but history does often reveal similarities between events.

Picture it like this; if you knew about everything that was bad in the world and at the same time you knew about everything that was good, you would logically be able to discover patterns and connections between the two. Well, that's the core concept of machine learning. The more the ML models ‘see’, over time, the more precise they become because they can begin to log patterns and make correlations between the common traits of “good” and “bad” files. 

This pattern generation, or algorithm, is the core concept of how CylancePROTECT® works. In a similar way, having learned from the signature-based past, we can now see the DAT file method of threat detection for what it truly is: illogical and neverending. 

Happy pi Day!

Oh, by the way, did I ever tell you what the definition of insanity is?