The NLP behind Facebook’s graph search

April 29th, 2013

Facebook engineers Xiao Li and Maxime Boucher describe the language processing techniques used to implement Facebook’s graph search in a recent post on the Facebook Engineering page (alternative for non-facebook-users via VentureBeat).

Users can enter a question like Which of my friends who went to school at the University of Illinois live in California? which is translated into a query over Facebook’s Open Graph. That data structure is an RDF like graph of millions of entities and objects of various types that are connected thousands of types of relations. This is a very interesting and application of current human language technology to a highly visible and useful task!

Semantic and graph-based search

April 29th, 2013

Barbara Starr posts at Search Engine Land about progress the notion of “semantic search” has made in the past three years, Semantic and Graph-Based Search: The Future Face Of Search:

“The prediction that search would become increasingly semantic and graph-based has certainly proven to be more than true. Not only have the search engines since adopted as a standard along with microdata as a syntax (Facebook RDFa and Open Graph are examples), but things are now elevated to the next level in this process of adoption.”

The vocabulary has been a big success and is being used by many popular content providers, but I’m less sure that Microdata is winning out over RDFa. I’ve seen reports that there is more data on the Web encoded in RDFa than Microdata.

It seems like an easy choice to use RDFa Lite over Microdata, since it’s just as simple and easy to use and lets you later add more features from full RDFa. The biggest RDFa feature is, of course, the ability to include statements from multiple vocabularies.

In the spirit of eating our own dog food, I hope to work on upgrading the ebiquity web site and blog to make fuller use of RDFa this summer.

Detecting fake and malicious Twitter accounts

April 25th, 2013

There has recently been a spike in the number of compromised Twitter accounts, which has increased concerns about the trustworthiness of information broadcast on Twitter and other social networks.  Just yesterday, the Associated Press Twitter account (@AP) was hacked and used to send out a false Twitter post about explosions at the White House. Last weekend saw Twitter accounts of CBS News (@60minutes@48hours) compromised. Corporate accounts belonging to Burger King and Jeep were also hacked in February this year.

We are working on techniques to predict that a given account is “fake” (falsely appears to represent a person or organization) or has been compromised and is being used to spreading malicious content.  Our approach analyses the account’s metadata, properties, network structure and the content in its posts. We also use both content and network analysis to identify the “real” account handle when multiple accounts appear or claim to represent the same person or organization on Twitter.

We recently analyzed a case where both @DeltaAssist and @flydeltassist appeared to represent Delta Airlines.  In February 2013, @flydeltaAssist, which turned out not to be associated with Delta, began tweeting an offer of free tickets if users “followed” them.  Eventually, the account was banned as a fake handle by Twitter. Our approach was able to answer the question “Which one of them belongs to the real Delta Airlines?” by analyzing the tweets and social network of these handles.

We are still in the process of writing up our research and evaluation results and hope to be able to post more about it soon.

SAP and open government data

April 13th, 2013

Heather McIlvaine from enterprise software giant SAP blogs about open data: “How are mobile apps, Big Data, and civic hacking changing the nature of open data in government? The Center for Technology in Government took a look at this topic and presents its findings”. See The Future Of Open Data.