Detecting fake and malicious Twitter accounts

April 25th, 2013

There has recently been a spike in the number of compromised Twitter accounts, which has increased concerns about the trustworthiness of information broadcast on Twitter and other social networks.  Just yesterday, the Associated Press Twitter account (@AP) was hacked and used to send out a false Twitter post about explosions at the White House. Last weekend saw Twitter accounts of CBS News (@60minutes@48hours) compromised. Corporate accounts belonging to Burger King and Jeep were also hacked in February this year.

We are working on techniques to predict that a given account is “fake” (falsely appears to represent a person or organization) or has been compromised and is being used to spreading malicious content.  Our approach analyses the account’s metadata, properties, network structure and the content in its posts. We also use both content and network analysis to identify the “real” account handle when multiple accounts appear or claim to represent the same person or organization on Twitter.

We recently analyzed a case where both @DeltaAssist and @flydeltassist appeared to represent Delta Airlines.  In February 2013, @flydeltaAssist, which turned out not to be associated with Delta, began tweeting an offer of free tickets if users “followed” them.  Eventually, the account was banned as a fake handle by Twitter. Our approach was able to answer the question “Which one of them belongs to the real Delta Airlines?” by analyzing the tweets and social network of these handles.

We are still in the process of writing up our research and evaluation results and hope to be able to post more about it soon.

The use and abuse of social media in elections

October 27th, 2012

The Pew Research Center reports that social media has become a feature of political and civic engagement for many in the U.S.

“Some 60% of American adults use either social networking sites like Facebook or Twitter and a new survey by the Pew Research Center’s Internet & American Life Project finds that 66% of those social media users—or 39% of all American adults—have done at least one of eight civic or political activities with social media.”

Wellesley computer science professor Panagiotis Metaxas has a short article in Science, Social Media and the Elections, on how social media can be abused in elections. An example he cites is the suspicious one-day spike of 110,000 Twitter followers received by a US presidential candidate recently and the subsequent analysis that showed showed that most of the followers were unlikely to be real people.

IEEE Spectrum has an interview with Professor Metaxas in which he discusses the issues surrounding social media and elections and mentions his recent paper, How (Not) To Predict Elections, that concludes that predicting election outcomes using the published research methods on Twitter data are not better than chance.

A novel use of social media to predict elections was show by FiveOneNine Games, who crunched the data from use of their election-themed Facebook game Campaign Story to predict that President Barack Obama the winner.

Gingrich Twitter followers not fake, just inactive

August 25th, 2011

Three weeks ago, it was widely reported that an analysis by PeekYou concluded that more than 90% of Newt Gingrich’s 1.3M Twitter followers were fake accounts, probably purchased to make him appear more popular. Further analysis by Topsy supports Newt Gingrich’s assertion that his Twitter followers were real people and that his campaign did not purchase any.

“Former House Speaker and GOP presidential candidate Newt Gingrich was correct in his explanation for why he has relatively few active accounts among his 1.3 million Twitter followers, an analysis requested by Mashable has revealed.

The initial analysis of his followers was apparently based on a a few trivial features, mostly the fact that the vast majority of them were inactive. But most of his followers came from the early days of Twitter when Gingrich’s account was on Twitter’s short list of suggestions for interesting people to follow. Mashable says:

“So there is no smoking gun to suggest that Gingrich, or any of these politicians, bought any of their followers. But what this kind of analysis also reveals, says Topsy, is how hard it is to say which Twitter accounts are for real and which aren’t. Spam bots are getting more sophisticated; many now have fake profile pictures, fake bios and generate fake tweets. “The fact is, a large proportion of all Twitter accounts are inactive anyway,” says Ghosh.

Sorting the humans from the fakes is a problem that companies like Topsy — and Twitter itself, which now has more than 200 million accounts — will be wrestling with for years to come.

Twitter at one billion tweets a week

March 15th, 2011

Twitter at one billion tweets a week

Twitter reports that its users are sent an average of 140M tweets a day last month. That adds up to a billion a week, in round numbers. Another impressive statistic their post cites is that last month saw an average of 460K new Twitter accounts per day. Both numbers are very impressive.

Liz Gannes comments on the fact that Twitter does not report on the total number of users it has or how many of these are active. The number of users is thought to be over 200M, but I recall data that is now over a year old estimating that 40% of the users have made no tweets and 80% have made fewer that 10 tweets. Maybe the bulk of those 460K new users a day are signing up to follow @charliesheen.

Twitter changes TOS;might hurt researchers

March 7th, 2011

ReadWriteWeb reports that Twitter recently made changes in its Terms of Service. Specifically, Twitter will no longer grant any more requests for whitelisting and it would no longer allow redistribution of its content either for commercial or non-commercial purposes. Twitter whitelisting was a way of allowing developers or researchers to access large quantities of data via the REST api. Although Twitter will honor already “whitelisted developers”, it will not grant any further requests.

The second change in the Terms of Service is with respect to redistribution of content.  This means any one who is gathering twitter data whether a developer or researcher can no longer share it with others even if it is for academic or non-commercial purposes. As ReadWriteWeb points out these changes will most likely hurt researchers who are dependent on third party organizations to provide data for their research.

As part of the new Twitter terms of service, 140kit like other organizations can no longer offer exports of Twitter data for any purposes – whether that’s for profit or non-profit, whether that’s for developers or scholars. You could be writing the next killer app. Or you could be working on the final chapter of your PhD dissertation. (And let me interject right here and say that having your access to research data shut down as a PhD student is beyond devastating.) It doesn’t matter. Exporting Tweets now violates the TOS.

It looks like Twitter just made it difficult for researchers to access data for their research.

Twitter turns to ads

October 10th, 2010

Sic transit gloria mundi.

After building a huge audience, Twitter turns to ads to cash in:

“In the last two weeks, the company has introduced several advertising plans, courted Madison Avenue at Advertising Week, the annual industry convention, and promoted Dick Costolo, who has led Twitter’s ad program, to chief executive — all signs that Twitter means business about business.

Advertisers pay for Promoted Tweets to appear at the top of search results. … Promoted Tweets will eventually show up in Twitter timelines, not just when people search, based on the interests of people that users follow. Twitter also sells Promoted Trends, so advertisers can show up in the list of topics most discussed on Twitter, for $100,000 a day.”

It seems like AdBlock already suppresses the Promoted Tweets, at least this one.

Twitter promoted tweet

Is Twitters plan to log all clicks a privacy loss?

September 2nd, 2010

Twitter’s planned shortening of all links via its service is about to happen. The initial motivation was security, according to Twitter:

“Twitter’s link service at is used to better protect users from malicious sites that engage in spreading malware, phishing attacks, and other harmful activity. A link converted by Twitter’s link service is checked against a list of potentially dangerous sites. When there’s a match, users can be warned before they continue.”

Declan McCullagh reports that Twitter announced in an email message that when someone click “on these links from or a Twitter application, Twitter will log that click.” Such information is extremely valuable. Give Twitter’s tens of millions of active users, just knowing how often certain URLs are clicked by people indicates what entities and topics are of interest at the moment.

“Our link service will also be used to measure information like how many times a link has been clicked. Eventually, this information will become an important quality signal for our Resonance algorithm—the way we determine if a Tweet is relevant and interesting.”

Associating the clicks with a user, IP address, location or device can yield even more information — like what you are interested in right now. Moreover, Twitter now has a way to associate arbitrary annotation metadata with each tweet. Analyzing all of this data can identify, for example, communities of users with common interests and the influential members within them.

Note that Twitter has not said it will do this or even that it will record and keep any user-identifiable information along with the clicks. They might just log the aggregate number of clicks in a window of time. But going the next step and capturing the additional information would be, in my mind, irresistible, even if there was no immediate plan to use it.

Search engines like Google already link clicks to users and IP addresses and use the information to improve their ranking algorithms and probably in many other ways. But what is troubling is the seemingly inexorable erosion of our online privacy. There will be no way to opt out of having your link wrapped by the service and no announced way to opt out of having your clicks logged.

Visualizing social media use in 16 countries

December 5th, 2009

Trendstream’s Global Web Index has a visualization, Global Map of Social Web, that shows the uptake of different social media systems in 16 countries around the world.

Map of the Social Web
Full size pdf

It’s a little busy, but it overlays a lot of information over the world map.

“The map visualises the number of active bloggers, social networkers, video sharers, photo uploaders and microbloggers. The length of the curve represents the penetration and the size represents the universe size. We have also included the actual numbers so you can use and apply the universe estimates.”

I was surprised to see the variation in popularity of the different modalities.

(via Mashable)

Twitter API enables geotagging

November 20th, 2009

Twitter turned on its API for geotagging tweets yesterday, as announce in in a post on their blog, Think Globally, Tweet Locally. Currently, geographic information will only be associated with your tweets if you use an application that adds it and will only be used to display your tweets when viewed with an application that can exploit it. Here’s the way Twitter described it.

“This release is unique in that it’s API-only which means you won’t see any changes on, yet. Instead, Twitter applications like Birdfeed, Seesmic Web, Foursquare, Gowalla, Twidroid, Twittelator Pro and others are already supporting this new functionality (go try them out now!) in interesting ways that include geotagging your tweets and displaying the location from where a tweet was posted.”

Examining Twitter’s status update API description shows how one associates a location with a Tweet. Pretty simple.

Since disclosing your location raises privacy concerns, Twitter has made geotagging an opt-in service and also allows users to delete all of the location information associated with their tweets. Moreover, their policy, as described here, says

“We require application developers to be upfront and obvious about when they are Geotagging an update. If you ever find that an application is doing it without notifying you, please let us know.”

Twitter has updated its privacy policy to cover location information.

You can read more on ReadWriteWeb and Techcrunch.

UK Teen on how teenagers consume media

July 13th, 2009

The Financial Times has an article, Note by ‘teenage scribbler’ causes sensation, on a research study written by a 15 year old Morgan Stanley intern on the new and old media habits of UK youth.

“Morgan Stanley’s European media analysts asked Matthew Robson, one of the bank’s interns from a London school, to describe his friends’ media habits.

“Teenagers do not use Twitter,” he pronounced. Updating the micro-blogging service from mobile phones costs valuable credit, he wrote, and “they realise that no one is viewing their profile, so their tweets are pointless”.

His peers find it hard to make time for regular television, and would rather listen to advert-free music on websites such as than tune into traditional radio. Even online, teens find advertising “extremely annoying and pointless”.

Their time and money is spent instead on cinema, concerts and video game consoles which, he said, now double as a more attractive vehicle for chatting with friends than the phone.

Mr Robson had little comfort for struggling print publishers, saying no teenager he knew regularly reads a newspaper since most “cannot be bothered to read pages and pages of text” rather than see summaries online or on television.”

You can read his report on How Teenagers Consume Media online.

The Guardian also has a story today, Twitter is not for teens, on the intern’s report.

The Iranian revolution will be Twittered, not televised

June 15th, 2009

Social media systems share some aspects of television, but not all. They differ in that their content is created by their users. While the revolution will not be televised, it can be tweeted. It’s been more than 50 years since TV was the thing.

The NYT has an article on the role that social media sites are playing in the conflicts surrounding the Iran election, Social Networks Spread Iranian Defiance Online.

“Iranians are blogging, posting to Facebook and, most visibly, coordinating their protests on Twitter, the messaging service. Their activity has increased, not decreased, since the presidential elections on Friday and ensuing attempts by the government to restrict or censor their online communications.
     On Twitter, reports and links to photos from a peaceful mass march through Tehran on Monday, along with accounts of street fighting and casualties around the country, have become the most popular topic on the service worldwide, according to Twitter’s published statistics.
     A couple of Twitter feeds have become virtual media offices for the supporters of the leading opposition candidate, Mir Hussein Moussavi. One feed, mousavi1388, (1388 is the year in the Persian calendar) is filled with news of protests and exhortations to keep up the fight, in Persian and English. It has more than 7,000 followers. Mr. Moussavi’s fan group on Facebook has swelled to over 50,000 members, a significant increase since election day.”

The article also reports on efforts to encourage cyber attacks on Iran sites

“Some Twitter users were also going on the offensive. On Monday morning, an antigovernment activist using the Twitter account “DDOSIran” asked supporters to visit a Web site to participate in an online attack to try to crash government Web sites by overwhelming them with traffic. By Monday afternoon, many of those sites were not accessible, though it was not clear if the attack was responsible — and the Twitter account behind the attack had been removed. A Twitter spokeswoman said the company had no connection to the deletion of the account.”

A php script is still available on the web and can be found if you search for it.

Tweets from Iran good source of immediate information on #iranelection

June 15th, 2009

The urban areas of Iran is developed and many there use social media, including Twitter. You can see their reactions to the election results and the public unrest in response to it via their tweets. Use this Twitter search query for a sample. This is an important example of how social media is having an impact on news.

Update: Also check out recent Flickr photos tagged with iranelection.

Update 2: Here are tweets geolocated to Tehran.