Math > Malware

We have all experienced the creative highs when we build innovative products that become popular or win acclaim. These products are not just software applications. It could be as simple as the wooden helicopter I helped my toddler assemble and paint on Father's day or the desserts my wife conjures up when we have friends over. Math underpins nearly every creative activity, even when the builder (my son) is not old enough to add numbers yet.

 

In the case of his helicopter, the product was designed and documented in a fashion that a 3-year old could assemble with a little help from me. We laid out all the pieces on the floor and he separated the longer parts (tail, rotor blade) from the shorter ones, which he assembled the cockpit. When it was time to paint, he accidentally took the green brush and put it into the blue paint bottle. This yielded a color which was neither blue nor green and I had to help him get the right proportion of colors to get a uniform coat of paint.

 

raj-helicopter

Am I stretching to make my point? Maybe. But, as a physics major, I have come to the conclusion that almost any problem can be reduced to a math equation and solved. Too simplistic a view? Not quite. At Cylance, we reduce the problem of finding malware to math. In my previous gig at Marketo, we helped Marketers identify the best leads by looking at all attributes tied to successful deals and using that information to predict who in the sales pipeline was most likely to buy. Math is everywhere, even when you don't see it or feel it.

Thus, math makes a world of difference in how you solve the problem. For decades, the security industry has played a cat and mouse game with the bad guys. I'll illustrate this with the example of anti-virus tools, which almost everyone is familiar with. The first implementations relied on signature based detection of malware.  This was later augmented with heuristic-based detections, file emulation and signatures in the cloud. All these techniques relied on a large database of known "bad" stuff. This is similar to the "no fly" lists that the government maintains. If you are a "known bad", you can't fly (i.e. execute). However, this approach fails to tackle the "unknown bad"—a sample has to become known (i.e. caught in flagrante delicto) to be put on the "no fly" list. When the A/V vendors noticed the sample, it had already infected many endpoints.

Math can help redefine the problem 

As the number of malware kept increasing at an exponential rate, approaches like whitelisting tried to solve the inverse problem. The idea was to create a list of programs which would execute and deny access to anything not on the whitelist. Sounds great in theory, but poses a few operational challenges. The first challenge, involves how the original whitelist of allowed binaries is created. Most endpoints have programs that are not part of the "gold image" from the IT department. Creating an inventory of all programs and whitelisting the approved set is a herculean task for large deployments. Another related issue is to keep the inventory and whitelist up to date. I installed pixie, balsamiq, workrave, hipchat, notepad++ and skype on top of the "Gold Image" in my first week at work. If these products are not in my organization's whitelist, I'll have to wait for my IT team to approve these programs before I can use them. IT enables businesses to succeed, not reduce productivity.

This brings us to the latest chapter of malware and advanced persistent threat protection technology which we have built. We collected millions of files— good and bad—disassembled them and learned which patterns indicate good intent and which patterns indicate bad. When the individual characteristics in a file are analyzed in aggregate they form distinct clusters. For instance, all trojans have similarities and form a cluster different from the group of adware or the collection of chat programs like skype and hipchat. When a new sample enters the endpoint, we disassemble it and compare it to our models to check if it falls in any of the bad clusters. This analysis is imperceptible to the user and can handle even the nefarious "unknown" files. Since, the good files form their own clusters, there is no need to whitelist the gold image or keep the whitelist up to date. Good programs are allowed to execute, while the bad stuff is blocked. If this sounds too good to be true, we invite you to try our product to see why math is truly greater than malware.

 

Raj Rajamani
VP Product Management