Skip Navigation
BlackBerry Blog

A Study in Bots: Backoff

FEATURE / 08.25.14 / Brian Wallace

Point of sale (POS) malware has become something of a hot topic over the past 12 months, the most notable being the Target breach that disclosed up to 110 Million records. And it doesn't stop there. On August 22nd, 2014, DHS reported that over 1,000 businesses were compromised by a previously unknown malware known as "Backoff".

What is Backoff

Backoff is POS malware developed for Windows, as this tends to be the most common operating system (OS) for these systems. The samples from these compromises were packed with varying novel methods and at first glance, the samples do not look related. Some of them are written in Visual Basic, while others are written in C. Each one of the samples have different sizes. It's only after unpacking that we–the human analysts–see the relation between the samples, however, the Cylance engine recognizes them as threats WITHOUT the need for detonation.

Detecting the "Undetectable"

Even the packed versions of this malware are a walk in the park for our mathematical detection engine, with each sample being detected as malcious–without having seen them before. One of the most powerful aspects of our engine is its ability to attach a "confidence rating" for each object detected–allowing for greater granularity and improved ability to convict malicious files.

CylanceV 2.0 detecting Backoff samples

If we take one of these samples (d9ba782016e834bab365d72071a66c54aa3b6821d957908b2da316cc5b66a8bd) and look at its detection history, we can see it was first uploaded to malware forums in July of 2014.

The sample Cylance detected was compiled on April 29th, 2014. Our latest mathematical model–which powers the Cylance engine–was compiled on April 30th, 2014 and detected this sample as a threat without the need for a "sacrificial lamb". This means that our customers were protected from this threat one day after the sample was first created. When I dig into our model archives, I actually find that we detected this sample with a model that was generated on February 5th, 2014 with a score of -0.97 (bad with 97% confidence), nearly three months before the compilation date of the sample. Zero-days? What zero-days?

Mathematical Model from February detecting sample compiled in April

We were detecting this sample as a threat before it was developed, conflicting with the claim made in this NYTimes article claiming the malware was undetectable up until July 31st.

Unpacking a Sample

For an unpacking example, I will use the most recent variant (11591204155db5eb5e9c5a3adbb23e99a75c3b25207d07d7e52a6407c7ad0165). The packer for this version is not in Visual Basic, but also does not contain many artifacts that show what was used to develop it. This packer works by storing the real PE as an obfuscated string along with code to finish unpacking. The packer code removes the obfuscation, then jumps into the appended code, which then unmaps the original file from memory, then replaces it with the original unpacked binary.  

Instead of going into technical details about how it works, I will show how to unpack it with open source tools. Load it up in OllyDbg (in a virtual machine of course), and start stepping through the instructions. The unpacking is done a few functions deep, so we need to step into a few CALL instructions. The first one is right by the entry point.

Entry point of packed binary

The next one is after a loop, and has a fairly long string as a parameter.

Second function of packed binary

In the resulting function, we can see a VirtualAlloc call which we can step over.

VirtualAlloc we can step over

Then, we reach the unpacking function. Before the function runs, if we look at the memory segment at 0x20000, we can see it is readable, writable and executable. We can see currently it only contains 0's.

Empty memory segment where our unpacked PE will reside

Set a break point right after the next call, in this case, it is at 0x004010BA.

Setting break point to stop executing after malware is unpacked

Then we let the application run until it reaches that break point. Once it reaches that break point, we can jump back to the memory view of 0x20000. There we can see that section now has code in it, as well as a PE.

Unpacked malware in memory segment

We can now dump this memory segment to disk. All we need to do is right click on the window with the memory segment open, and select Backup -> Save data to file.

Writing memory segment to disk

Once we have this dumped to a file, we can use any means we wish to remove the bytes leading up to the beginning of the PE. I transferred the file to my Linux host. By using tail, hexdump, and head, we can search around for where the PE starts. We could also use grep.

Locating the PE in the dumped memory segment

Since we found the offset, we can use tail and redirection to write the PE to disk. This PE is no longer packed.

Extracting the PE

If we search for hosts in this file with eh, we can see the command and control servers used by the attackers.

Using EH to extract host information

If we search for the string "php", we can find the path on the command and control server to which the bot reports back.

Grep for php references

Static Analysis

Once the sample is unpacked, it is quite simple to analyze. We can quickly develop a Yara rule that can detect these samples.

 rule backoff {     meta:         author = "Brian Wallace @botnet_hunter"         author_email = "bwallace@cylance.com"         date = "2014-08-21"         description = "Identify Backoff"     strings:         $s1 = "&op=%d&id=%s&ui=%s&wv=%d&gr=%s&bv=%s"         $s2 = "%s @ %s"         $s3 = "Upload KeyLogs"     condition:         all of them }           

Signatures like these are useful for identification, but are far less effective than the protection provided by CylancePROTECT.

Operation

These samples all copy themselves to new locations, then delete the files at the original locations. This is one of the first things these samples do. Different variations of the malware relocate to the following locations.

 %APPDATA%\AdobeFlashPlayer\mswinsvc.exe %APPDATA%\AdobeFlashPlayer\mswinhost.exe %APPDATA%\OracleJava\javaw.exe 

The samples also utilize mutexes to avoid running more than one instance at a time. The following are observed mutexes by different samples.

 uhYtntr56uisGst uyhnJmkuTgD Undsa8301nskal nuyhnJmkuTgD nUndsa8301nskal 

 

The main operation of the bot occurs in three different threads. These threads do not directly communicate with each other, and from a software engineering stand point, could indicate that Backoff was created by multiple developers without much cooperative development experience. The three threads do the following: Credit Card Scraping, Keystroke Logging, and Command and Control Communications.

Command and Control

The command and control thread regularly checks with the command and control server for commands as well as uploading credit card data.  The command and control server is a PHP script (at least one) which is communicated to via HTTP and would require a code change to use HTTPS. Each request made is a POST request with identifier information.

C2 POST PING in Wireshark

For this example, the bot is reporting home from a virtual machine named "LAB", where the user is "dexter". The "wv" parameter dictates the version of Windows. The "gr" parameter dictates which variant is reporting to the command and control server.  The "bv" parameter is the malware version. The "id" parameter is used later.

Also in that example, I have my fake command and control returning a command of "Upload KeyLogs". The command and control server has various commands.

 Update - Downloads binary to replace itself Terminate - exits Uninstall - uninstalls itself and exits Download and Run - Downloads and executes binary Upload KeyLogs - Uploads data gathered from key logging Thanks! - Empty command           

Credit card information is uploaded while the malware is operating regardless of the set command.

Data Uploads

When information is uploaded, it is encrypted and encoded in the "data" parameter. The encoding is Base64 and the encryption is RC4. The key being used for the RC4 encryption changes between samples and installs of the malware. The "id" parameter and the "ui" parameter are both used along with a static string that varies between samples. The following are the known static strings so far.

 jhgtsd7fjmytkr ihasd3jasdhkas9 ihasd3jasdhkas zXqW9JdWLM4urgjRkX 

 

Before MD5 hashing, the parts are in the order of: id, static string, ui. The following Python script can decrypt the encoded and encrypted data.

 #!/usr/bin/env python2 import hashlib import base64 import string  # Update these variables with data to decrypt to_decrypt = "eOtiYEntho7IvwjfJ7beQnot4xfoscImZeoWG76nK38kYsSzRws5rjZZ38zinkQF9AG77aGhfG5zvZ5eTbmqng==" id = "NYGOsSB" ui = "dexter @ LAB"   class rc4:     def swap(self):        t = self.state[self.i]        self.state[self.i] = self.state[self.j]        self.state[self.j] = t      def __init__(self, key):         self.state = [i for i in xrange(256)]          self.j = 0         for i in xrange(256):             self.i = i             self.j = (self.j + self.state[self.i] + ord(key[self.i % len(key)])) % 256             self.swap()          self.i = 0         self.j = 0      def next(self):         self.i = (self.i + 1) % 256         self.j = (self.j + self.state[self.i]) % 256         self.swap()         return self.state[(self.state[self.i] + self.state[self.j]) % 256]   def decrypt_data(data, id, ui):     data = data.replace(" ", "+")     m = hashlib.md5()     m.update(id)     m.update(ui)     rc4_key = m.hexdigest().upper()     r = rc4(rc4_key)     return "".join([chr(r.next() ^ ord(c)) for c in base64.b64decode(data)])  static_strings = ["zXqW9JdWLM4urgjRkX", "ihasd3jasdhkas", "ihasd3jasdhkas9", "jhgtsd7fjmytkr"]  for s in static_strings:     result = decrypt_data(to_decrypt, id + s, ui)     v = True     for c in result:         if c not in string.printable:             v = False             break     if v:         print result         break 

 

This Python script already contains the values for the following example POST request with the encrypted data in the "data" parameter.

C2 POST KeyLogging information in Wireshark

The script decrypts the data value as the following.

   [Untitled - Notepad] - [23/08/2014 00:52:04] hello world!!!           

This data is from the key logging, as I typed "hello world!!!" into an instance of notepad.exe.

Credit Card Scraping

The credit card scraping is done in its own thread. The functionality of the scraping is not different from the methods seen in other point of sale malware.  It will enumerate running processes, then attempt to open each one. Once the process handle is open, it will attempt to enumerate memory segments that it can read. For each of these, it will read the memory segment, and parse through it for credit card track information. It will then upload parts of the track data just as it did with the key logging information.

C2 POST Credit card information in Wireshark

When we decrypt this information, we get the following.

 4111111111111111^^210510100000000000000011400000 4111111111111111=210510100000114000? 

 

This is fake credit card information I stored in an instance of notepad.exe. The code parsing for credit card information ignores any credit card that does not use normal rules for authorization processing (prepaid credit cards, temporary bank cards, etc). Specifically, it ignores any credit card track data in which the second digit of the Service Code is not 0. You can get more information on the specifics of credit card track information here.

Conclusion

The Backoff POS malware was previously unknown until it was discovered at over 1,000 businesses. Despite being similar to other POS malware, it managed to stay hidden mostly due to the novel packing methods that were able to defeat some of the biggest cybersecurity vendor's ability to detect it. CylancePROTECT and other Cylance products would have stopped this attack, as our mathematical models from February detected some of these samples before they were even compiled. Want to get plugged in to the power of PROTECT? Contact a Cylance expert today!

Samples Used in this Post

 12c9c0bc18fdf98189457a9d112eebfc 17e1173f6fc7e920405f8dbde8c9ecac 6a0e49c5e332df3af78823ca4a655ae8 f5b4786c28ccf43e569cb21a6122a97e 0607ce9793eea0a42819957528d92b02 
Brian Wallace

About Brian Wallace

Lead Security Data Scientist at Cylance

Brian Wallace is a data scientist, security researcher, malware analyst, threat actor investigator, cryptography enthusiast and software engineer. Brian acted as the leader and primary investigator for a deep investigation into Iranian offensive cyber activities which resulted in the Operation Cleaver report, coauthored with Stuart McClure.

Brian also authors the A Study in Bots blog series which covers malware families in depth providing novel research which benefits a wide audience.