Echoes of the Network

Wednesday, December 10, 2014

Careful When You Fake Your Twitter Compromise: We Can Detect It

As those who've followed our research know well, during the last four years we have worked on systems that can automatically detect malicious activity on online social networks. In particular, last year we presented COMPA, a system that learns the typical behaviour of social network accounts and raises an alert if an accounts posts a message that does not comply with such behaviour. We showed that COMPA is a reliable tool to detect accounts that have been compromised, and we showed how our behavioural modelling could have helped in preventing high-profile social network compromises, such as the ones against the Skype Twitter account and the associated press Twitter account.

Flash forward by a year or so, Honda reported that their Twitter account had been hacked last week. The alleged culprit was the cartoon villain and Robot Chicken celebrity Skeletor.

.@PeopleMag I demand a recount for #SexiestManAlive #BoneStructure #Skeletakeover pic.twitter.com/qFMFYZTfiG

— Honda (@Honda) December 1, 2014

Obviously, fictional characters do not hack Twitter accounts, and it was soon clear that this hack had only been simulated for promotional purposes. It is not the first time that such a thing happens, Chipotle did the same a little over a year ago. Apparently Twitter compromises became so mainstream that faking one is an attractive marketing technique. This trick even works, or so it seems. In the day of the simulated compromise, Chipotle collected more than 4,000 followers, an order of magnitude more than what they typically attract.

The clever marketers at these companies did not take into account COMPA though. Our system was able to correctly assess that nothing was anomalous about the malicious tweets sent by Honda, as well as about the ones sent by Chipotle. Basically, our tool is not only useful in detecting messages that are sent by attackers who gained access to social network accounts, but can also detects compromises that are only simulated.

Next time you stage a Twitter compromise, make sure that your messages look anomalous, otherwise we can detect your bluff.

Saturday, May 31, 2014

New Insights into Email Spam Operations

Our group has been studying spamming botnets for a while, and our efforts in developing mitigation techniques and taking down botnets have contributed in decreasing the amount of spam on the Internet. During the last couple of years the spam volumes have significantly dropped, but spam still remains a significant burden to the email infrastructure and to email users. Recently, we have been working on gaining a better understanding of spam operations and of the actors involved in this underground economy. We believe that shedding light on these topics can help researchers develop novel mitigation techniques, and identifying which of the already-existing techniques are particularly effective in crippling spam operations, and should therefore be widely deployed. Our efforts produced two papers.

The first paper, which will be presented at AsiaCCS next week, is a longitudinal study of the spam delivery pipeline. Previous research showed that to set up a spam operation a spammer has to interact with multiple specialized actors. In particular, he has to purchase a list of email addresses to target with his spam emails, and he needs a botnet to send the actual spam. Both services are provided by specialized entities that are active on the underground market, which we call "email harvesters" and "botmasters" respectively. In this paper, we studied the relations between the different actors in the spam ecosystem. We want to understand how widely email lists are sold, and to how many spammers, as well as how many botnets each spammer rents to set up their operations.

To perform our study, we proceeded as follows. First, we disseminated fake email addresses under our control on the web. We consider any access to the web pages where these email addresses are hosted as a possible email harvester, and "fingerprint" it by logging its IP address and user agent. By doing this, every time we receive a spam email destined to a certain address, we can track which email harvester collected that address. Similarly, we can fingerprint the botnet that is sending the spam email by using a technique that we presented at USENIX Security in 2012, called SMTP dialects. In a nutshell, this technique leverages the fact that each implementation of the SMTP protocol used by spambots is different, and that it is possible to assess the family that a bot belongs to just by looking at the sequence of SMTP messages that it exchanges with the email server. Finally, we assume that a single spammer is responsible of each spam campaign, and cluster together similar emails.

After collecting the aforementioned information, we can track a spam operation from its beginning to its end: we know which email list spammers used, as well as which botnet they took advantage of. Our results show that spammers develop some sort of "brand loyalty" both to email harvesters and to botmasters: each spammer that we observed used a single botnet over a period of six months, and kept using the same email list for a long period of time.

The second paper, which was presented at the International Workshop on Cyber Crime earlier this month, studies the elements that a spammer needs to set to make his botnet perform well. We studied the statistics of 24 C&C servers belonging to the Cutwail botnet, looking at which element differentiate successful spammers from failed ones. The first element is the number of bots that the spammer uses. Having too many bots connecting to the C&C server saturates its bandwidth and results in bad performance. Another element is the size of the email list used by spammers. "Good" spammers trim their email list from non-existing email addresses, avoiding their bots to waste time sending emails that will never get delivered. A third element consists in having bots retry to send an email multiple times after receiving a server error: since many bots have poor Internet connections, this helps keeping the fraction of emails successfully sent high. The last, surprising finding is that the physical location of bots seems not to influence the performance of a spam campaign. As a side effect of this, successful spammers typically purchase bots located in developing countries, which are typically cheaper.

The findings from this paper show us which elements spammers tune to make their operation perform well. Fortunately, there are a number of systems that have been proposed by the research community that target exactly these elements. We think that widely deploying these proposed techniques could significantly cripple spam operations, to a point that might make these operations not profitable anymore. An example of these techniques is B@BEL, a system that detects whether an email sender is reputable or not, and provides fake feedback on whether an email address exists or not anytime it detects the sender as a bot. Providing fake feedback would make it impossible for spammers to cleanup their lists from non-existing email addresses, compromising the performance of their operations.

Similarly, Beverly et al. proposed a system that flags senders as bots if network errors are too common. Such system can be used as a direct countermeasure to having spammers instruct their bots to keep trying sending emails after receiving errors. Finally, SNARE is a system that, among other features, looks at the geographical distance between sender and recipients to detect spam. Since spammers purchase their bots in countries that are typically far away from their victims (who are mostly located in western countries), this system could be very effective in fighting spam if widely deployed.

We hope that the insights provided in these two papers will provide researchers with new ideas to develop effective anti-spam techniques.

Tuesday, January 7, 2014

Detecting the Skype Twitter Compromise

On January 1, 2014 Microsoft-owned Skype added itself to the list of high-profile Twitter compromised accounts. The company's Twitter account posted the following tweet:

As it can be seen, the tweet was even "signed" by the #SEA hashtag, which stands for Syrian Electronic Army. SEa is a group of hackers that support the Syrian regime, which has been responsible for previous high-profile Twitter hacks. The tweet states "Don't use Microsoft emails(hotmail,outlook). They are monitoring your accounts and selling the data to the governments. More details soon #SEA". Basically a classical case of political hacktivism.

The tweet looks anomalous at first sight, not only for the odd content sent out by the account of a Microsoft-owned Twitter account, but also for the hashtag that attributes the tweet to the middle-eastern group. However, Twitter's automated defenses did not block the tweet as anomalous. Even worse, since the compromise happened on a holiday, it took Microsoft hours before the tweet was taken down.

It is to detect and prevent such incidents that we developed COMPA. COMPA learns the typical behavior of a Twitter account, and flags as anomalous any tweet that significantly deviates from the learned behavioral model. My colleague Manuel Egele and I checked the malicious tweet by the Syrian Electronic Army against the behavioral model built for the Skype Twitter account. The result is positive: had it been deployed on Twitter, COMPA would have detected and blocked the tweet, and saved Microsoft some public relations embarrassment.

More in detail, the Skype twitter account always posts from the Sprinklr Twitter client, while the malicious tweet was sent by the regular Twitter web interface. This fact is already very suspicious by itself. As a second element, the Skype Twitter account never used the #SEA hashtag before. In addition, the malicious tweet did not contain a URL, which is common practice in Skype's tweets. Interestingly, the time of the day at which the tweet was sent matches the typical sending patterns of the Skype Twitter account. However, this was not enough to evade detection by our system.

This result shows that COMPA is effective in detecting and block tweets that are sent by compromised Twitter accounts. We strongly advocate that Twitter and other social networks implement similar techniques to keep their accounts protected, and block malicious tweets before they are posted.

Thursday, April 25, 2013

Could the AP Twitter hack have been prevented?

Twitter hacks can cause a lot of damage. It is news of this week that the Associated Press Twitter account got compromised, and sent a tweet announcing that the White House had been hit by a terrorist attack, and that President Obama was injured. The dynamics of the hack are not clear yet, even though some sources claim that the AP people might have been victim of a spearphishing attack.

What is sure is that the hack had a huge, unprecedented effect on the stock market. Right after the malicious tweet was sent, the New York stock exchange suddenly fell more than 150 points. The market recovered short afterwards (after it was clear that the announcement was a hoax), but somebody could definitely have made a lot of money from this event.

This is the first time that people realize that Tweets can have a large effect on financial institutions. The question that people are asking is: could this compromise have been avoided? The answer is maybe. At the last NDSS Symposium we presented a paper titled "COMPA: Detecting Compromised Accounts on Social Networks." The goal of the paper is to detect, and block, messages that are sent by compromised social network accounts, just like the AP one. Our system leverages a simple observation: people develop habits when using social networks. These habits include connecting to the network at specific times, using certain applications / clients to interact with the network, including links to specific domain in their messages, and so on. When an account gets compromised, the malicious messages that are sent are likely to show differences from this behavior. We developed a system, called COMPA, that learns the typical behavior of users on social networks, and raises an anomaly if a user sends a message that does not comply with the learned behavior.

We ran COMPA on the offending Tweet sent by the AP account. More precisely, we learned the historical behavior of the account, and we checked the malicious tweet against it. COMPA detected the tweet as anomalous. In particular, the tweet was sent from the web, while the AP operators typically use the SocialFlow app. In addition, the tweet does not include a URL, which is something that pretty much every news tweet contains. This is not a surprise to us. When the Fox News Politics account got hacked in 2011, COMPA was able to detect the offending tweet as anomalous too.

We think that the type of behavioral modelling that we ran in COMPA is the way in which social networks should implement their detection of compromised accounts algorithms, and we hope to see this type of techniques deployed in the wild in a near future.

Wednesday, October 3, 2012

SMTP Dialects, or how to detect bots by looking at SMTP conversations

It is somewhat surprising that, in 2012, we are still struggling fighting spam. In fact, any victory we score against botnets is just temporary, and the spam levels raise again after some time. As an example, the amount of spam received worldwide dropped dramatically when Microsoft shut down the Rustock botnet, but has been rising again since then.

For these reasons, we need new techniques to detect and block spam. Current techniques mostly fall in two categories: content analysis and origin analysis. Content analysis techniques look at what is being sent, and typically analyze the content of an email to see if it is indicative of spam (for example, if it contains words that are frequently linked to spam content). Origin analysis techniques, on the other hand, look at who is sending an email, and flag the email as spam if the sender (for example the IP address the email is coming from) is known to be malicious. Both content analysis and origin analysis techniques fall short and have problems in practice. For instance, content analysis is usually very resource intensive, and cannot be run on every email sent to large, busy mail servers. Also, it can be evaded by carefully crafting the spam email. On the other hand, origin analysis techniques often have coverage problems, and fail to detect as malicious many sources that are actually sending out spam.

In our paper B@BEL: Leveraging Email Delivery for Spam Mitigation, that got presented at the USENIX Security Symposium last August, we propose to look at how emails are sent instead. The idea behind our approach is simple: the SMTP protocol, which is used to send emails on the Internet, follows Postel's Law, which states: "Be liberal in what you accept, but conservative in what you send". As a consequence of this, email software developers can come up with their own interpretation of the SMTP protocol, and still be able to successfully send emails. We call these variations of the protocol SMTP dialects. In the paper we show how it is possible to figure out which software (legitimate of malicious) sent a certain email just by looking at the SMTP messages exchanged between the client and the server. We also show how it is possible to enumerate the dialects spoken by spamming bots, and leverage them for spam mitigation.

Although not perfect, this technique allows, if used in conjunction with existing ones, to catch more spam, and it is a useful advancements in the war against spamming botnets.

Wednesday, July 25, 2012

Fake followers on Twitter: my two cents

During the last few days, a huge fuss has been made about this report. This article, written by Italian professor Marco Camisani Calzolari, describes a system to detect fake followers on Twitter. The article shows how many of the Twitter followers of corporations and celebrities (up to 45%) are actually fake. Among such celebrities are Italian public persons and politicians such as Beppe Grillo and Nichi Vendola. The news got a lot of attention in Italy, and got reported by foreign press as well (most notably by the the Guardian and the Huffington Post). Of course, a lot of outrage was generated by the supporters of this or that politician, and many people argued that the study wasn't correct. Today, Italian economics professor Francesco Sacco declared that the study actually has an error margin of 1%, and should be considered correct.

Now, I am a researcher, and I am not very interested in flame wars between opposite political factions. However, I am quite disappointed that the Italian press, as well as some foreign newspapers, considered this study as reputable without at least checking with an expert. As of today, a few days after the news was first published, the only person from academia who reviewed the article is an economics professor. With all due respect, I think that somebody with a degree in computer science and some experience in machine learning and artificial intelligence would be a better person to review this article, and judge how reasonable the proposed approach actually is.

I decided to write this blog post because I have been reading a lot of comments on this article, but most of them were just flames, and very few of them analyzed the proposed approach in detail. I decided to analyze it myself. After all, I have been doing research in the field for quite a while now. In the academic world, we have this procedure called peer review. When somebody submits a paper to a journal or to a conference, the paper gets read by two to three other researchers, who value the validity of the proposed approach, and how reasonable the results sound. If the reviewers think the paper is good enough, it will be published. Otherwise, the author will have to make some changes to the paper, and submit elsewhere.

Camisani didn't go through this process, but just uploaded the paper to his website. For this reason, neither the approach nor the results have been vetted. Let's play what-if, and pretend that this paper actually got submitted to a conference, and that I had been assigned to review it. Here is what I would have written:

The paper proposes a method to detect fake Twitter accounts (i.e., bots) that follow popular accounts, such as the ones belonging to celebrities and corporations. To this end, the author identified a number of features that are typical of "human" activity, as well as ones that are indicative of automatic, "bot-like" activity. For each account taken into consideration, if the account shows features that are typical of a human, it will get "human points". Conversely, if it shows features that are typical of a bot, it will get "bot points". The total of human and bot points gets then passed to a decision software, that decides whether the account is real or not. Here comes the first problem with the article: the decision procedure is not described at all. How many "bot" point does an account need to score to be considered as a bot? Is this compared to "human" points? And how are the accounts that lie in the grey area in the middle calculated? Also, the classification features are not discussed. Why are those typical of human or bot activity? Why is posting from multiple applications a sign of being a human? On the contrary, this could be a sign of being a bot, since Twitter periodically blocks offending applications and miscreants have to create new ones. Moreover, the classification procedure seems to be ad hoc and unverified. Using a classification algorithm and a training phase on labeled data would have helped - a lot.

The second problem with the paper is that it is not clear how the followers for the analysis have been chosen. Only "up to" 10,000 followers per each account were checked, allegedly by using a random algorithm. This has been done, I believe, because Twitter limits the number of queries that can be asked each hour. However, technical details on how the whole process has been performed are missing. Without such details, it is impossible to evaluate how accurate the results are. Stating that half of the followers of somebody are fake just means that, according to the algorithm, 5,000 followers are maybe fake.

A third problem is that it is impossible to check whether the detected accounts are fake or not. The problem is known to be very hard, because it is pretty much impossible to detect a bot from a fairly inactive account. Twitter itself relies on manual analysis to sort this kind of issues.

The last problem is that this paper doesn't cite any previous research in the field, and there is been a wealth of it. This way, it is impossible to compare how sound the results are, compared to the state of the art. However, this was not the goal of the paper. The goal was to get publicity, and this worked perfectly.

My verdict? REJECT.

Friday, May 4, 2012

Poultry Markets: On the Underground Economy of Twitter Followers

Twitter has become such an important medium that companies and celebrities use it extensively to reach their customers and their fans. Nowadays, creating a large and engaged network of followers can determine the difference between succeeding and failing in marketing. However, creating such a network requires time, especially when the party building it does not have an established reputation among the public.

For this reason, a number of websites to help Twitter users create a large network of followers have emerged. These websites promise their subscribers to provide followers in exchange for a fee. In addition, some of these services offer to spread promotional messages in the network. We call this phenomenon Twitter Account Markets. We study this phenomenon in our paper "Poultry Markets: On the Underground Economy of Twitter Followers", that will appear at the SIGCOMM Workshop on Online Social Networks (WOSN) later this year.

Typically, the services offered by a Twitter Account Market are accessible through a webpage, similar to the one below. Customers can buy followers at a rate that is between $20 and $100 for 1,000 followers. In addition, markets typically offer the possibility of having content sent by a certain number of accounts, again in exchange for a fee.

All Twitter Account Markets we analyzed offer both "free" and "premium" versions of their services. While premium accounts pay for their services, the free ones gain followers by giving away their Twitter credentials (a clever way of phishing). Once the market administrator gets the credentials for an account, he can follow other Twitter accounts (that are free or premium customers of the market), or send out "promoted" content (typically spam). For convenience, the market administrator typically authorizes an OAUTH application by using his victim's stolen credentials. By doing this, he can easily administer a large number of accounts, by leveraging the Twitter API.

Twitter Account Markets are a big problem on Twitter: first, an account with an inflated number of followers tends to look more trustworthy to the other social network users. Second, these services introduce spam in the network.

Of course, Twitter does not like this behavior. In fact, they introduced a clause in their Terms of Service that specifically forbids to participate in Twitter Account Markets operations. Twitter periodically suspends the OAuth applications that are used by Twitter Account Markets. However, since the market administrator has the credentials to his victims' accounts, he can go and authorize a new application, and continue his operation.

In our paper, we propose techniques to both detect Twitter Account Market victims and customers. We believe that an effective way of mitigating this problem would be to focus on the customers, rather than on the victims. Since participating in a Twitter Account Market violates the terms of service, Twitter could suspend such accounts, and impact the market from the economic side.