During the last few days, a huge fuss has been made about this report. This article, written by Italian professor Marco Camisani Calzolari, describes a system to detect fake followers on Twitter. The article shows how many of the Twitter followers of corporations and celebrities (up to 45%) are actually fake. Among such celebrities are Italian public persons and politicians such as Beppe Grillo and Nichi Vendola. The news got a lot of attention in Italy, and got reported by foreign press as well (most notably by the the Guardian and the Huffington Post). Of course, a lot of outrage was generated by the supporters of this or that politician, and many people argued that the study wasn't correct. Today, Italian economics professor Francesco Sacco declared that the study actually has an error margin of 1%, and should be considered correct.
Now, I am a researcher, and I am not very interested in flame wars between opposite political factions. However, I am quite disappointed that the Italian press, as well as some foreign newspapers, considered this study as reputable without at least checking with an expert. As of today, a few days after the news was first published, the only person from academia who reviewed the article is an economics professor. With all due respect, I think that somebody with a degree in computer science and some experience in machine learning and artificial intelligence would be a better person to review this article, and judge how reasonable the proposed approach actually is.
I decided to write this blog post because I have been reading a lot of comments on this article, but most of them were just flames, and very few of them analyzed the proposed approach in detail. I decided to analyze it myself. After all, I have been doing research in the field for quite a while now. In the academic world, we have this procedure called peer review. When somebody submits a paper to a journal or to a conference, the paper gets read by two to three other researchers, who value the validity of the proposed approach, and how reasonable the results sound. If the reviewers think the paper is good enough, it will be published. Otherwise, the author will have to make some changes to the paper, and submit elsewhere.
Camisani didn't go through this process, but just uploaded the paper to his website. For this reason, neither the approach nor the results have been vetted. Let's play what-if, and pretend that this paper actually got submitted to a conference, and that I had been assigned to review it. Here is what I would have written:
The paper proposes a method to detect fake Twitter accounts (i.e., bots) that follow popular accounts, such as the ones belonging to celebrities and corporations. To this end, the author identified a number of features that are typical of "human" activity, as well as ones that are indicative of automatic, "bot-like" activity. For each account taken into consideration, if the account shows features that are typical of a human, it will get "human points". Conversely, if it shows features that are typical of a bot, it will get "bot points". The total of human and bot points gets then passed to a decision software, that decides whether the account is real or not. Here comes the first problem with the article: the decision procedure is not described at all. How many "bot" point does an account need to score to be considered as a bot? Is this compared to "human" points? And how are the accounts that lie in the grey area in the middle calculated? Also, the classification features are not discussed. Why are those typical of human or bot activity? Why is posting from multiple applications a sign of being a human? On the contrary, this could be a sign of being a bot, since Twitter periodically blocks offending applications and miscreants have to create new ones. Moreover, the classification procedure seems to be ad hoc and unverified. Using a classification algorithm and a training phase on labeled data would have helped - a lot.
The second problem with the paper is that it is not clear how the followers for the analysis have been chosen. Only "up to" 10,000 followers per each account were checked, allegedly by using a random algorithm. This has been done, I believe, because Twitter limits the number of queries that can be asked each hour. However, technical details on how the whole process has been performed are missing. Without such details, it is impossible to evaluate how accurate the results are. Stating that half of the followers of somebody are fake just means that, according to the algorithm, 5,000 followers are maybe fake.
A third problem is that it is impossible to check whether the detected accounts are fake or not. The problem is known to be very hard, because it is pretty much impossible to detect a bot from a fairly inactive account. Twitter itself relies on manual analysis to sort this kind of issues.
The last problem is that this paper doesn't cite any previous research in the field, and there is been a wealth of it. This way, it is impossible to compare how sound the results are, compared to the state of the art. However, this was not the goal of the paper. The goal was to get publicity, and this worked perfectly.
My verdict? REJECT.
Hi,
ReplyDeletevery nice article. Although the peer review process is not perfect it definitely beats the process that was taken here.
Minor issue in your review: ''Conversely, if it shows features that are typical of a bot, it will get "human points"'' should be "bot points" I guess.
Corrected, thanks :)
ReplyDelete