paper: Leveraging Attribute History to Link User Profiles across Online Social Networks

May 20th, 2016
Paridhi Jain, Ponnurangam Kumaraguru and Anupam Joshi, Other Times, Other Values: Leveraging Attribute History to Link User Profiles across Online Social Networks, 26th ACM Conference on Hypertext and Social Media (HT15), pp. 247-255, Sept 2015.

Profile linking is the ability to connect profiles of a user on different social networks. Linked profiles can help companies like Disney to build psychographics of potential customers and segment them for targeted marketing in a cost-effective way. Existing methods link profiles by observing high similarity between most recent (current) values of the attributes like name and username. However, for a section of users observed to evolve their attributes over time and choose dissimilar values across their profiles, these current values have low similarity. Existing methods then falsely conclude that profiles refer to different users. To reduce such false conclusions, we suggest to gather rich history of values assigned to an attribute over time and compare attribute histories to link user profiles across networks. We believe that attribute history highlights user preferences for creating attribute values on a social network. Co-existence of these preferences across profiles on different social networks result in alike attribute histories that suggests profiles potentially refer to a single user. Through a focused study on username, we quantify the importance of username history for profile linking on a dataset of real-world users with profiles on Twitter, Facebook, Instagram and Tumblr. We show that username history correctly links 44% more profile pairs with non-matching current values that are incorrectly unlinked by existing methods. We further explore if factors such as longevity and availability of username history on either profiles affect linking performance. To the best of our knowledge, this is the first study that explores viability of using an attribute history to link profiles on social networks.

Assessing credibility of content on Twitter using automated techniques

November 29th, 2015

Aditi Gupta

10:30am, Monday 30 November 2015, ITE 346

Online social media is a powerful platform for dissemination of information during real world events. Beyond the challenges of volume, variety and velocity of content generated on online social media, veracity poses a much greater challenge for effective utilization of this content by citizens, organizations, and authorities. Veracity of information refers to the trustworthiness /credibility / accuracy / completeness of the content. This work addressed the challenge of veracity or trustworthiness of content posted on social media.  We focus our work on Twitter, which is one of the most popular microblogging web service today. We provided an in-depth analysis of misinformation spread on Twitter during real world events. We showed effectiveness of automated techniques to detect misinformation on Twitter using a combination of content, meta-data, network, user profile and temporal features. We developed and deployed a novel framework, TweetCred for providing indication of trustworthiness / credibility of tweets posted during events. TweetCred, which was available as a browser plug-in, was installed and used by real Twitter users.

Dr. Aditi Gupta is a research associate in the Computer Science and Electrical Engineering Department at UMBC.  She received her Ph.D. from the Indraprastha Institute of Information Technology, Delhi  (IIIT-Delhi) in 2105 for her dissertation on designing and evaluating techniques to mitigate misinformation spread on microblogging web services.

The NLP behind Facebook’s graph search

April 29th, 2013

Facebook engineers Xiao Li and Maxime Boucher describe the language processing techniques used to implement Facebook’s graph search in a recent post on the Facebook Engineering page (alternative for non-facebook-users via VentureBeat).

Users can enter a question like Which of my friends who went to school at the University of Illinois live in California? which is translated into a query over Facebook’s Open Graph. That data structure is an RDF like graph of millions of entities and objects of various types that are connected thousands of types of relations. This is a very interesting and application of current human language technology to a highly visible and useful task!

Detecting fake and malicious Twitter accounts

April 25th, 2013

There has recently been a spike in the number of compromised Twitter accounts, which has increased concerns about the trustworthiness of information broadcast on Twitter and other social networks.  Just yesterday, the Associated Press Twitter account (@AP) was hacked and used to send out a false Twitter post about explosions at the White House. Last weekend saw Twitter accounts of CBS News (@60minutes@48hours) compromised. Corporate accounts belonging to Burger King and Jeep were also hacked in February this year.

We are working on techniques to predict that a given account is “fake” (falsely appears to represent a person or organization) or has been compromised and is being used to spreading malicious content.  Our approach analyses the account’s metadata, properties, network structure and the content in its posts. We also use both content and network analysis to identify the “real” account handle when multiple accounts appear or claim to represent the same person or organization on Twitter.

We recently analyzed a case where both @DeltaAssist and @flydeltassist appeared to represent Delta Airlines.  In February 2013, @flydeltaAssist, which turned out not to be associated with Delta, began tweeting an offer of free tickets if users “followed” them.  Eventually, the account was banned as a fake handle by Twitter. Our approach was able to answer the question “Which one of them belongs to the real Delta Airlines?” by analyzing the tweets and social network of these handles.

We are still in the process of writing up our research and evaluation results and hope to be able to post more about it soon.

Google Reader, we hardly knew ye

March 13th, 2013

We felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were about to be silenced. We fear something terrible is about to happen. We went access the blogs we follow on Google Reader and found this.

Powering Down Google Reader
3/13/2013 04:06:00 PM

Posted by Alan Green, Software Engineer

We have just announced on the Official Google Blog that we will soon retire Google Reader (the actual date is July 1, 2013). We know Reader has a devoted following who will be very sad to see it go. We’re sad too.

There are two simple reasons for this: usage of Google Reader has declined, and as a company we’re pouring all of our energy into fewer products. We think that kind of focus will make for a better user experience.

To ensure a smooth transition, we’re providing a three-month sunset period so you have sufficient time to find an alternative feed-reading solution. If you want to retain your Reader data, including subscriptions, you can do so through Google Takeout.

Thank you again for using Reader as your RSS platform.
Labels: reader, sunset

Where is old Bloglines now that we need him again? We should not have been so disloyal.

The use and abuse of social media in elections

October 27th, 2012

The Pew Research Center reports that social media has become a feature of political and civic engagement for many in the U.S.

“Some 60% of American adults use either social networking sites like Facebook or Twitter and a new survey by the Pew Research Center’s Internet & American Life Project finds that 66% of those social media users—or 39% of all American adults—have done at least one of eight civic or political activities with social media.”

Wellesley computer science professor Panagiotis Metaxas has a short article in Science, Social Media and the Elections, on how social media can be abused in elections. An example he cites is the suspicious one-day spike of 110,000 Twitter followers received by a US presidential candidate recently and the subsequent analysis that showed showed that most of the followers were unlikely to be real people.

IEEE Spectrum has an interview with Professor Metaxas in which he discusses the issues surrounding social media and elections and mentions his recent paper, How (Not) To Predict Elections, that concludes that predicting election outcomes using the published research methods on Twitter data are not better than chance.

A novel use of social media to predict elections was show by FiveOneNine Games, who crunched the data from use of their election-themed Facebook game Campaign Story to predict that President Barack Obama the winner.

On Facebook it is 4.74 degrees of separation, not six

November 21st, 2011

On Facebook, it’s 4.74 degrees of separation, not six, according to a new study by study by researchers at Facebook and the university of Milan.

“Think back to the last time you were in a crowded airport or bus terminal far from home. Did you consider that the person sitting next to you probably knew a friend of a friend of a friend of yours? In the 1960s, social psychologist Stanley Milgram’s “small world experiment” famously tested the idea that any two people in the world are separated by only a small number of intermediate connections, arguably the first experimental study to reveal the surprising structure of social networks.

With the rise of modern computing, social networks are now being mapped in digital form, giving researchers the ability to study them on a much grander, even global, scale. Continuing this tradition of social network research, Facebook, in collaboration with researchers at the Università degli Studi di Milano, is today releasing two studies of the Facebook social graph.

First, we measured how many friends people have, and found that this distribution differs significantly from previous studies of large-scale social networks. Second, we found that the degrees of separation between any two Facebook users is smaller than the commonly cited six degrees, and has been shrinking over the past three years as Facebook has grown. Finally, we observed that while the entire world is only a few degrees away, a user’s friends are most likely to be of a similar age and come from the same country.

A story in the New York Times, Separating You and Me? 4.74 Degrees points out how the scale of social network studies have grown.

“The original “six degrees” finding, published in 1967 by the psychologist Stanley Milgram, was drawn from 296 volunteers who were asked to send a message by postcard, through friends and then friends of friends, to a specific person in a Boston suburb. The new research used a slightly bigger cohort: 721 million Facebook users, more than one-tenth of the world’s population.”

Got a problem? There’s a code for that

September 15th, 2011

The Wall Street Journal article Walked Into a Lamppost? Hurt While Crocheting? Help Is on the Way describes the International Classification of Diseases, 10th Revision that is used to describe medical problems.

“Today, hospitals and doctors use a system of about 18,000 codes to describe medical services in bills they send to insurers. Apparently, that doesn’t allow for quite enough nuance. A new federally mandated version will expand the number to around 140,000—adding codes that describe precisely what bone was broken, or which artery is receiving a stent. It will also have a code for recording that a patient’s injury occurred in a chicken coop.”

We want to see the search engine companies develop and support a Microdata vocabulary for ICD-10. An ICDM-10 OWL DL ontology has already been done, but a Microdata version might add a lot of value. We could use it on our blogs and Facebook posts to catalog those annoying problems we encounter each day, like W59.22XD (Struck by turtle, initial encounter), or Y07.53 (Teacher or instructor, perpetrator of maltreat and neglect).

Humor aside, a description logic representation (e.g., in OWL) makes the coding system seem less ridiculous. Instead of appearing as a catalog of 140K ground tags, it would emphasize that it is a collection of a much smaller number of classes that can be combined in productive ways to produce them or used to create general descriptions (e.g., bitten by an animal).

Detecting fake Google+ profiles with image search

September 11th, 2011

Many Google+ users have been reporting frequent notices about new followers that they don’t know and appear to be attractive young women. The suspicious followers have minimal profiles and no posts. These are obviously false accounts being created for some yet unknown purpose, but how can one prove it?

I just got a notice, for example, that Janet Smith of Philadelphia is following me. Now Janet Smith is a common name and Philadelphia is a big place — there are probably hundreds of people who live in the Philadelphia area with that name. The 990 other people she’s following seem like a pretty random bunch, though I do know many and have more than a few in my own circles. Most seem to have a fair number of followers.

So there is not much to go on other than her profile image. This is a great use for Google’s new image search. I dragged the picture into the image search query field and Google identified its best guess for the image as Indian actress Koyel Mullick. Sure enough, if you search for images with her name, the precise Janet Smith image is result number 15.

Of course, there are still some subtle issues. This is just one kind of false profile — one created for one identity but using an image from a different one. It’s common on most social media systems, including G+, for some people to use a picture of someone or something other than themselves. But it’s obvious to a human viewer that using a picture of a rabbit, Marilyn Monroe or the mighty Thor on your profile is not meant to deceive. It will be challenging to automate the process of discriminating the intent to deceive from modesty, homage or an ironic gesture.

Gingrich Twitter followers not fake, just inactive

August 25th, 2011

Three weeks ago, it was widely reported that an analysis by PeekYou concluded that more than 90% of Newt Gingrich’s 1.3M Twitter followers were fake accounts, probably purchased to make him appear more popular. Further analysis by Topsy supports Newt Gingrich’s assertion that his Twitter followers were real people and that his campaign did not purchase any.

“Former House Speaker and GOP presidential candidate Newt Gingrich was correct in his explanation for why he has relatively few active accounts among his 1.3 million Twitter followers, an analysis requested by Mashable has revealed.

The initial analysis of his followers was apparently based on a a few trivial features, mostly the fact that the vast majority of them were inactive. But most of his followers came from the early days of Twitter when Gingrich’s account was on Twitter’s short list of suggestions for interesting people to follow. Mashable says:

“So there is no smoking gun to suggest that Gingrich, or any of these politicians, bought any of their followers. But what this kind of analysis also reveals, says Topsy, is how hard it is to say which Twitter accounts are for real and which aren’t. Spam bots are getting more sophisticated; many now have fake profile pictures, fake bios and generate fake tweets. “The fact is, a large proportion of all Twitter accounts are inactive anyway,” says Ghosh.

Sorting the humans from the fakes is a problem that companies like Topsy — and Twitter itself, which now has more than 200 million accounts — will be wrestling with for years to come.

Free online courses on AI, databases and machine learning

August 16th, 2011

Stanford is experimenting with an interesting idea — offering some of their most popular undergraduate computer science courses online for free and simultaneously with their regular offerings. An AI course was announced several weeks ago and now there are similar offerings for databases and machine learning. These are taught by first rate instructors (who are also top researchers!) and are the same courses that Stanford students take.

  • “A bold experiment in distributed education, “Introduction to Artificial Intelligence” will be offered free and online to students worldwide during the fall of 2011. The course will include feedback on progress and a statement of accomplishment. Taught by Sebastian Thrun and Peter Norvig, the curriculum draws from that used in Stanford’s introductory Artificial Intelligence course. The instructors will offer similar materials, assignments, and exams.”
  • “A bold experiment in distributed education, “Introduction to Databases” will be offered free and online to students worldwide during the fall of 2011. Students will have access to lecture videos, receive regular feedback on progress, and receive answers to questions. When you successfully complete this class, you will also receive a statement of accomplishment. Taught by Professor Jennifer Widom, the curriculum draws from Stanford’s popular Introduction to Databases course.”
  • “A bold experiment in distributed education, “Machine Learning” will be offered free and online to students worldwide during the fall of 2011. Students will have access to lecture videos, lecture notes, receive regular feedback on progress, and receive answers to questions. When you successfully complete the class, you will also receive a statement of accomplishment. Taught by Professor Andrew Ng, the curriculum draws from Stanford’s popular Machine Learning course.”

If successful, this might be a game changer. Two weeks after the online AI course was announced, 56,000 students had signed up! The approach might work for many disciplines, not just CS. The Kahn Academy is a related effort.

Universities should keep an eye on them and think about how to adapt if they are successful. Most of our students will probably benefit from taking our traditional courses. If so, we should be able to explain the benefits from taking them (and make sure we deliver those benefits). At the same time, we may want to leverage the online material from these courses in a synergistic way.

JWS special issues: Semantic Sensing and Social Semantic Web

July 27th, 2011

The Journal of Web Semantics announced two new special issues, one on semantic sensing and another on the semantic and social web. Both will be publshed in 2012 with preprints made freely available online as papers are accepted.

The special issue on semantic sensing will be edited by Harith Alani, Oscar Corcho and Manfred Hauswirth. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 20 December 2011.

The issue on the semantic and social web will be edited by John Breslin and Meena Nagarajan. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 21 January 2012.

See the JWS Guide for Authors for details on the submission process.