Tuesday, June 14, 2011

BotMagnifier: Locating Spambots on the Internet

During the 20th USENIX Security Symposium, which will take place in San Francisco starting August 8, we will present our paper BotMagnifier: Locating Spambots on the Internet.

This paper tries to tackle the problem of detecting bot infected machines from a new perspective: the idea behind BotMagnifier is that bots belonging to the same botnet will share the same codebase and will take orders from the same set of C&C servers. Based on this insight, it should be possible to detect bot infected machines by learning the spamming behavior of a subset of known bots, and look in a network traffic dataset for more machines (i.e., IP addresses) that behaved in the same way.

Having an extensive list of bot infected machines is useful for many purposes: it helps tracking the size of the world's largest spamming botnets, and it can be used by ISPs to clean up their networks, by removing or sanitizing the infected machines.

We developed a system, called BotMagnifier, which is able to grow bot populations from a subset of known spamming bots. In particular, our system builds, for each day of analysis, a collection of IP addresses, called seed pools, that are known to have carried out a specific campaign. To do this, we take advantage of a large spam trap, set up by a US provider. On the side, we also run malware samples, and, when possible, label the campaigns we observe in the spam trap with the botnet that generated them.

After we have the seed pools, we learn the spamming behavior of these IP addresses using a transaction log. A transaction log is a record of transactions carried out on the Internet during the same time period used for the generation of the seed pools. It gives information on which IP address sent an email to which destination at what time. We used the logs from our Spamhaus mirror at UCSB for building the transaction log. There are many feature that can characterize the spamming behavior of a bot. Unfortunately, our transaction log is very partial, and show us only a small part of the email transactions that actually happened. For this reason, we only characterize the behavior of the botnet based on the destinations (i.e., mail servers) its bots contacted during a certain time frame. First, we list the destination each seed pool contacted on the transaction logs. As a second step, we look for more IPs that contacted a certain number of those destination (more than a threshold N), and no others. By doing this, we obtain a magnified pool of IP address, that we believe belong to the same botnet.

To validate our approach, we used the data contained in the Cutwail C&C servers we captured last summer. We extracted a subset of IPs, we grew them as described, and we checked how many of those actually connected to the C&C servers during the time of the experiment. Our results show that, with good confidence, our approach is able to effectively track botnets.

We also ran BotMagnifier in the wild for four months. During this period, we were able to track the activity of the world's largest spamming botnets (Rustock, Lethic, Cutwail), and we detected important events, such as the comeback of the Waledack botnet, or the takedown of MegaD.

Despite our choice of datasets for building seed pools and for the transaction logs, we also show that the approach can work on any other dataset, by tweaking some parameters.