Friday, April 8, 2011

Why timing patterns and botnet detection don't work together

Last year I started working on a project whose goal was to spot behavioral patterns in the bots belonging to different botnets. The basic idea was that bots belonging to a botnet will periodically get orders from a botmaster. The orders will include an e-mail template, and a list of addresses to send those e-mails to. After having carried out its task, a bot would have waited for the next chunk of orders from the botmaster. From a traffic point of view, this behavior would have been reflected in periods of high activity, followed by periods of idleness by the bots.
The first dataset we used to study the bot behavior is a spam trap set up by a large ISP. This spam trap is composed by 150k e-mail addresses, all belonging to the same domain. We logged the e-mails these addresses received for a while, and clustered them in campaigns. Our assumption is that each campaign will be carry out by a single botnet (but, of course, the same botnet can carry out different campaigns). By analyzing the different campaigns, we found that the spam trap addresses received the e-mails during specific times, which reflected in spikes and long idle periods. We were happy about this discovery, that would have made it possible to detect bots just by observing their e-mail sending behavior.
We then moved to the logs from our Spamhaus mirror. By looking at the queries mail server ask to our server, we can infer which IP address sent an e-mail to which server at a given point in time. By looking at these logs, we found out that the same IP addresses that showed nice timing patterns in the spam trap data appeared to be active all the time on this dataset.
To find out what was going on, we decided to run malware samples from the largest spamming botnets at the time. We actually found out that the bots are active all the time, with no meaningful idle periods. We then made another interesting discovery: usually bots get a chunk of a large e-mail list to send their spam to, and often this list is alphabetically ordered by domain. The timing patterns we were seeing are caused by the bots reaching the letter our spam trap domain starts with, and the idle periods were caused by the bots sending mails to other domains! 
Mystery solved, and a good lesson learned.

No comments:

Post a Comment