UMBC ebiquity
Social media

Archive for the 'Social media' Category

On Facebook it is 4.74 degrees of separation, not six

November 21st, 2011, by Tim Finin, posted in Social media, Web

On Facebook, it’s 4.74 degrees of separation, not six, according to a new study by study by researchers at Facebook and the university of Milan.

“Think back to the last time you were in a crowded airport or bus terminal far from home. Did you consider that the person sitting next to you probably knew a friend of a friend of a friend of yours? In the 1960s, social psychologist Stanley Milgram’s “small world experiment” famously tested the idea that any two people in the world are separated by only a small number of intermediate connections, arguably the first experimental study to reveal the surprising structure of social networks.

With the rise of modern computing, social networks are now being mapped in digital form, giving researchers the ability to study them on a much grander, even global, scale. Continuing this tradition of social network research, Facebook, in collaboration with researchers at the Università degli Studi di Milano, is today releasing two studies of the Facebook social graph.

First, we measured how many friends people have, and found that this distribution differs significantly from previous studies of large-scale social networks. Second, we found that the degrees of separation between any two Facebook users is smaller than the commonly cited six degrees, and has been shrinking over the past three years as Facebook has grown. Finally, we observed that while the entire world is only a few degrees away, a user’s friends are most likely to be of a similar age and come from the same country.

A story in the New York Times, Separating You and Me? 4.74 Degrees points out how the scale of social network studies have grown.

“The original “six degrees” finding, published in 1967 by the psychologist Stanley Milgram, was drawn from 296 volunteers who were asked to send a message by postcard, through friends and then friends of friends, to a specific person in a Boston suburb. The new research used a slightly bigger cohort: 721 million Facebook users, more than one-tenth of the world’s population.”

Got a problem? There’s a code for that

September 15th, 2011, by Tim Finin, posted in Google, KR, Ontologies, OWL, Semantic Web, Social media

The Wall Street Journal article Walked Into a Lamppost? Hurt While Crocheting? Help Is on the Way describes the International Classification of Diseases, 10th Revision that is used to describe medical problems.

“Today, hospitals and doctors use a system of about 18,000 codes to describe medical services in bills they send to insurers. Apparently, that doesn’t allow for quite enough nuance. A new federally mandated version will expand the number to around 140,000—adding codes that describe precisely what bone was broken, or which artery is receiving a stent. It will also have a code for recording that a patient’s injury occurred in a chicken coop.”

We want to see the search engine companies develop and support a Microdata vocabulary for ICD-10. An ICDM-10 OWL DL ontology has already been done, but a Microdata version might add a lot of value. We could use it on our blogs and Facebook posts to catalog those annoying problems we encounter each day, like W59.22XD (Struck by turtle, initial encounter), or Y07.53 (Teacher or instructor, perpetrator of maltreat and neglect).

Humor aside, a description logic representation (e.g., in OWL) makes the coding system seem less ridiculous. Instead of appearing as a catalog of 140K ground tags, it would emphasize that it is a collection of a much smaller number of classes that can be combined in productive ways to produce them or used to create general descriptions (e.g., bitten by an animal).

Detecting fake Google+ profiles with image search

September 11th, 2011, by Tim Finin, posted in Machine Learning, Semantic Web, Social media

Many Google+ users have been reporting frequent notices about new followers that they don’t know and appear to be attractive young women. The suspicious followers have minimal profiles and no posts. These are obviously false accounts being created for some yet unknown purpose, but how can one prove it?

I just got a notice, for example, that Janet Smith of Philadelphia is following me. Now Janet Smith is a common name and Philadelphia is a big place — there are probably hundreds of people who live in the Philadelphia area with that name. The 990 other people she’s following seem like a pretty random bunch, though I do know many and have more than a few in my own circles. Most seem to have a fair number of followers.

So there is not much to go on other than her profile image. This is a great use for Google’s new image search. I dragged the picture into the image search query field and Google identified its best guess for the image as Indian actress Koyel Mullick. Sure enough, if you search for images with her name, the precise Janet Smith image is result number 15.

Of course, there are still some subtle issues. This is just one kind of false profile — one created for one identity but using an image from a different one. It’s common on most social media systems, including G+, for some people to use a picture of someone or something other than themselves. But it’s obvious to a human viewer that using a picture of a rabbit, Marilyn Monroe or the mighty Thor on your profile is not meant to deceive. It will be challenging to automate the process of discriminating the intent to deceive from modesty, homage or an ironic gesture.

Gingrich Twitter followers not fake, just inactive

August 25th, 2011, by Tim Finin, posted in Social media, Twitter

Three weeks ago, it was widely reported that an analysis by PeekYou concluded that more than 90% of Newt Gingrich’s 1.3M Twitter followers were fake accounts, probably purchased to make him appear more popular. Further analysis by Topsy supports Newt Gingrich’s assertion that his Twitter followers were real people and that his campaign did not purchase any.

“Former House Speaker and GOP presidential candidate Newt Gingrich was correct in his explanation for why he has relatively few active accounts among his 1.3 million Twitter followers, an analysis requested by Mashable has revealed.

The initial analysis of his followers was apparently based on a a few trivial features, mostly the fact that the vast majority of them were inactive. But most of his followers came from the early days of Twitter when Gingrich’s account was on Twitter’s short list of suggestions for interesting people to follow. Mashable says:

“So there is no smoking gun to suggest that Gingrich, or any of these politicians, bought any of their followers. But what this kind of analysis also reveals, says Topsy, is how hard it is to say which Twitter accounts are for real and which aren’t. Spam bots are getting more sophisticated; many now have fake profile pictures, fake bios and generate fake tweets. “The fact is, a large proportion of all Twitter accounts are inactive anyway,” says Ghosh.

Sorting the humans from the fakes is a problem that companies like Topsy — and Twitter itself, which now has more than 200 million accounts — will be wrestling with for years to come.

Free online courses on AI, databases and machine learning

August 16th, 2011, by Tim Finin, posted in AI, CS, Database, Machine Learning, Social media, Web

Stanford is experimenting with an interesting idea — offering some of their most popular undergraduate computer science courses online for free and simultaneously with their regular offerings. An AI course was announced several weeks ago and now there are similar offerings for databases and machine learning. These are taught by first rate instructors (who are also top researchers!) and are the same courses that Stanford students take.

  • “A bold experiment in distributed education, “Introduction to Artificial Intelligence” will be offered free and online to students worldwide during the fall of 2011. The course will include feedback on progress and a statement of accomplishment. Taught by Sebastian Thrun and Peter Norvig, the curriculum draws from that used in Stanford’s introductory Artificial Intelligence course. The instructors will offer similar materials, assignments, and exams.”
  • “A bold experiment in distributed education, “Introduction to Databases” will be offered free and online to students worldwide during the fall of 2011. Students will have access to lecture videos, receive regular feedback on progress, and receive answers to questions. When you successfully complete this class, you will also receive a statement of accomplishment. Taught by Professor Jennifer Widom, the curriculum draws from Stanford’s popular Introduction to Databases course.”
  • “A bold experiment in distributed education, “Machine Learning” will be offered free and online to students worldwide during the fall of 2011. Students will have access to lecture videos, lecture notes, receive regular feedback on progress, and receive answers to questions. When you successfully complete the class, you will also receive a statement of accomplishment. Taught by Professor Andrew Ng, the curriculum draws from Stanford’s popular Machine Learning course.”

If successful, this might be a game changer. Two weeks after the online AI course was announced, 56,000 students had signed up! The approach might work for many disciplines, not just CS. The Kahn Academy is a related effort.

Universities should keep an eye on them and think about how to adapt if they are successful. Most of our students will probably benefit from taking our traditional courses. If so, we should be able to explain the benefits from taking them (and make sure we deliver those benefits). At the same time, we may want to leverage the online material from these courses in a synergistic way.

JWS special issues: Semantic Sensing and Social Semantic Web

July 27th, 2011, by Tim Finin, posted in AI, Ontologies, Semantic Web, Social media

The Journal of Web Semantics announced two new special issues, one on semantic sensing and another on the semantic and social web. Both will be publshed in 2012 with preprints made freely available online as papers are accepted.

The special issue on semantic sensing will be edited by Harith Alani, Oscar Corcho and Manfred Hauswirth. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 20 December 2011.

The issue on the semantic and social web will be edited by John Breslin and Meena Nagarajan. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 21 January 2012.

See the JWS Guide for Authors for details on the submission process.

Twitter at one billion tweets a week

March 15th, 2011, by Tim Finin, posted in Social media, Twitter

Twitter at one billion tweets a week

Twitter reports that its users are sent an average of 140M tweets a day last month. That adds up to a billion a week, in round numbers. Another impressive statistic their post cites is that last month saw an average of 460K new Twitter accounts per day. Both numbers are very impressive.

Liz Gannes comments on the fact that Twitter does not report on the total number of users it has or how many of these are active. The number of users is thought to be over 200M, but I recall data that is now over a year old estimating that 40% of the users have made no tweets and 80% have made fewer that 10 tweets. Maybe the bulk of those 460K new users a day are signing up to follow @charliesheen.

Twitter changes TOS;might hurt researchers

March 7th, 2011, by Varish Mulwad, posted in Social media, Twitter

ReadWriteWeb reports that Twitter recently made changes in its Terms of Service. Specifically, Twitter will no longer grant any more requests for whitelisting and it would no longer allow redistribution of its content either for commercial or non-commercial purposes. Twitter whitelisting was a way of allowing developers or researchers to access large quantities of data via the REST api. Although Twitter will honor already “whitelisted developers”, it will not grant any further requests.

The second change in the Terms of Service is with respect to redistribution of content.  This means any one who is gathering twitter data whether a developer or researcher can no longer share it with others even if it is for academic or non-commercial purposes. As ReadWriteWeb points out these changes will most likely hurt researchers who are dependent on third party organizations to provide data for their research.

As part of the new Twitter terms of service, 140kit like other organizations can no longer offer exports of Twitter data for any purposes – whether that’s for profit or non-profit, whether that’s for developers or scholars. You could be writing the next killer app. Or you could be working on the final chapter of your PhD dissertation. (And let me interject right here and say that having your access to research data shut down as a PhD student is beyond devastating.) It doesn’t matter. Exporting Tweets now violates the TOS.

It looks like Twitter just made it difficult for researchers to access data for their research.

ICWSM 2011 Data Challenge with 3TB of social media data

February 23rd, 2011, by Tim Finin, posted in Datamining, NLP, Semantic Web, Social media

The Fifth International AAAI Conference on Weblogs and Social Media is holding a new data challenge using a new dataset from that includes about three TB of social media data collected by Spinn3r between January 13 and February 14th, 2011.

The dataset consists of over 386M blog posts, news articles, classifieds, forum posts and social media content in a month including events such as the Tunisian revolution and the Egyptian protests. The content includes the syndicated text, its original HTML as found on the web, annotations and metadata (e.g., author information, time of publication and source URL), and boilerplate/chrome extracted content. The data is formatted as Spinn3r’s protostreams – an extension to Google protobuffers. It is also broken down by date, content type and language making it easy to work with selected data.

See the ICWSM Data Challenge pages for more information on the challenge task, its associated ICWSM workshop and procedures for data access.

Science on Dealing with Data

February 12th, 2011, by Tim Finin, posted in Machine Learning, Semantic Web, Social media

The current (11 February 2011) issue of Science is a special issue on Dealing with Data. It includes a collection of free, online articles that “highlights both the challenges posed by the data deluge and the opportunities that can be realized if we can better organize and access the data.” Some of the articles are drawn from three sister publications: Science Signaling, Science Translational Medicine and Science Careers.

From the issue’s introduction:

Special issue of Science on Dealing with Data

“Scientific innovation has been called on to spur economic recovery; science and technology are essential to improving public health and welfare and to inform sustainability; and the scientific community has been criticized for not being sufficiently accountable and transparent. Data collection, curation, and access are central to all of these issues.

As you will discover, two themes appear repeatedly: Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data.”

One of the great things about the “data deluge” is that there is something in it for almost all computer science researchers including areas like machine learning, data mining, NLP, visualization, semantic web, security and privacy, social media, high performance computing, HCI, etc. Here are some of the articles that caught our eye:

and still more that look very interesting:

First annual Facebook Hackers Cup

December 9th, 2010, by Tim Finin, posted in Facebook, Social media

First Annual Facebook Hackers CupIf you are good at solving hard problems and like to program here is something you might do over your winter break: compete in Facebook’s first annual Hackers Cup.

The Hacker Cup will start in Janaury and aims to “bring engineers from around the world together to compete in a multi-round programming competition.” Contestants will work to solve algorithmic-based problem statements to advance and be ranked based on their accuracy and speed in solving them. Winners will get cash prizes and those who do well will probably get invitations to interview for jobs or internships.

Registration begins on Monday December 20 and the first three online rounds will be held in January (7-10, 15-16, and 22). The top 25 contestants after the third round will be flown out to the Facebook campus in Palo Alto for the final competition, which will take place on March 11.

For practice, Facebook suggests you work on some of the problems from their Puzzle Master Page. See http://www.facebook.com/hackercup for more information.

Recorded Future analyses streaming Web data to predict the future

October 30th, 2010, by Tim Finin, posted in AI, Datamining, Google, Machine Learning, NLP, sEARCH, Semantic Web, Social media

Recorded Future is a Boston-based startup with backing from Google and In-Q-Tel uses sophisticated linguistic and statistical algorithms to extract time-related information from streams of Web data about entities and events. Their goal is to help their clients to understand how the relationships between entities and events of interest are changing over time and make predictions about the future.

Recorded Future system architecture

A recent Technology Review article, See the Future with a Search, describes it this way.

“Conventional search engines like Google use links to rank and connect different Web pages. Recorded Future’s software goes a level deeper by analyzing the content of pages to track the “invisible” connections between people, places, and events described online.
   ”That makes it possible for me to look for specific patterns, like product releases expected from Apple in the near future, or to identify when a company plans to invest or expand into India,” says Christopher Ahlberg, founder of the Boston-based firm.
   A search for information about drug company Merck, for example, generates a timeline showing not only recent news on earnings but also when various drug trials registered with the website clinicaltrials.gov will end in coming years. Another search revealed when various news outlets predict that Facebook will make its initial public offering.
   That is done using a constantly updated index of what Ahlberg calls “streaming data,” including news articles, filings with government regulators, Twitter updates, and transcripts from earnings calls or political and economic speeches. Recorded Future uses linguistic algorithms to identify specific types of events, such as product releases, mergers, or natural disasters, the date when those events will happen, and related entities such as people, companies, and countries. The tool can also track the sentiment of news coverage about companies, classifying it as either good or bad.”

Pricing for access to their online services and API starts at $149 a month, but there is a free Futures email alert service through which you can get the results of some standing queries on a daily or weekly basis. You can also explore the capabilities they offer through their page on the 2010 US Senate Races.

“Rather than attempt to predict how the the races will turn out, we have drawn from our database the momentum, best characterized as online buzz, and sentiment, both positive and negative, associated with the coverage of the 29 candidates in 14 interesting races. This dashboard is meant to give the view of a campaign strategist, as it measures how well a campaign has done in getting the media to speak about the candidate, and whether that coverage has been positive, in comparison to the opponent.”

Their blog reveals some insights on the technology they are using and much more about the business opportunities they see. Clearly the company is leveraging named entity recognition, event recognition and sentiment analysis. A short A White Paper on Temporal Analytics has some details on their overall approach.

You are currently browsing the archives for the Social media category.

  Home | Archive | Login | Feed