UMBC ebiquity
Web 2.0

Archive for the 'Web 2.0' Category

Call for ISWC 2008 Research Papers

March 6th, 2008, by Tim Finin, posted in iswc, Semantic Web, Social media, Web, Web 2.0

The call for ISWC 2008 research papers for the Seventh International Semantic Web Conference is online. The track is co-chaired by Amit Sheth and Steffen Staab and has nineteen distinguished vice chairs and an program committee of experienced experts. Key dates for the research track are:

  • Abstracts due by 9 May 2008
  • Submissions due before 16 May 2008
  • Rebuttal phase during 14-16 June 2008
  • Notification sent by 11 July 2008
  • Camera ready due before 15 August 2008

Join the ICWSM community on CrowdVine

February 26th, 2008, by Tim Finin, posted in Social media, Web, Web 2.0

We invite you to join the ICWSM 2008 social networking community site hosted by CrowdVine. ICWSM 2008 is the Second International Conference on Weblogs and Social Media which will take place in Seattle between March 30 and and April 2. If you are coming to ICWSM next month, you can use this site to help plan and shape the event, facilitate finding and connecting with people at the conference, and share your ideas and comments. If you aren’t able to make it to Seattle, it will provide a way for you to engage even though you can’t be there. Joining the ICWSM community on CrowdVine is easy and free, so please check it out.

Wisdom of the crowd control?

February 24th, 2008, by Tim Finin, posted in Social media, Web, Web 2.0, Wikipedia

Slate has an interesting article, The Wisdom of the Chaperones — Digg, Wikipedia, and the myth of Web 2.0 democracy, that explores who controls some of the popular social media sites. It turns out that the social web is more hegemonic than we thought.

wikipedia hegemony

“Social-media sites like Wikipedia and Digg are celebrated as shining examples of Web democracy, places built by millions of Web users who all act as writers, editors, and voters. In reality, a small number of people are running the show. According to researchers in Palo Alto, 1 percent of Wikipedia users are responsible for about half of the site’s edits. The site also deploys bots—supervised by a special caste of devoted users—that help standardize format, prevent vandalism, and root out folks who flood the site with obscenities. This is not the wisdom of the crowd. This is the wisdom of the chaperones.” (link)

The work cited is by the Augmented Social Cognition research group at PARC. See, for example, their post on the behavior of the most active Wikipedians. Very interesting.

I think it’ even worse, in many ways, on Digg, which the article also discusses.

“The same undemocratic underpinnings of Web 2.0 are on display at Digg.com. Digg is a social-bookmarking hub where people submit stories and rate others’ submissions; the most popular links gravitate to the site’s front page. The site’s founders have never hidden that they use a “secret sauce”—a confidential algorithm that’s tweaked regularly—to determine which submissions make it to the front page. Historically, this algorithm appears to have favored the site’s most active participants. Last year, the top 100 Diggers submitted 44 percent of the site’s top stories. In 2006, they were responsible for 56 percent.” (link)

Will rule by the few always be the case? Who knows. The article does point out that the moderation system used by Slashdot helps to broaden the elite and also describes a simple “write one, rate two” policy used by Helium, a site new to me. Helium is a community for freelance writers that helps them connect with publishers who will pay for articles on their topics. The publishers are vetted, so students seeking to buy term papers will have to look elsewhere.

How to use XFN (XML Friends Network)

February 21st, 2008, by Tim Finin, posted in Semantic Web, Social media, Web 2.0

Brian Suda has a good, practical article on XFN on opera.dev — XFN encoding, extraction, and visualizations.

“In this article I will take a good look at XFN – the microformat for describing relationships between people. I will look briefly at what it is and the basic markup needed to add the information to your sites, before then going into depth, looking at the benefits you can get from that data by extracting it and using it in different ways.”

He covers the how and why of XFN and has good examples and code fragments. FOAF is only mentioned once in passing, however..

Approximating the Community Structure of the Long Tail

February 18th, 2008, by Akshay Java, posted in Machine Learning, Semantic Web, Social media, Web, Web 2.0

Social Networks and Web graphs exhibit certain typical properties. The classic work by Barabási–Albert showed how nodes in such network link preferentially — popular nodes often gain disproportionately larger share of the links. This is also known in other fields as the 80/20 rule or simply the “rich get richer phenomenon“. Another early work by Steve Borgatti studied social networks and found that they exhibit a core-periphery property. A small set of (popular) nodes form the core and the rest comprise of the peripheral nodes. To the best of my knowledge, community detection algorithms have often worked independent of such underlying network properties.

I have been exploring an idea that can utilize the core-periphery structure of social networks to approximately compute the communities in the graph. The intuition behind this method is really quite simple. The basic idea boils down to the following:

“The core of the social network typically defines the communities present in it. By looking at the link structure of the core and identifying how the rest of the network connects to the core we can efficiently compute communities in large graphs.”

This idea can be easily explained by considering the following network of email communication (obtained from Dr. Mark Newman’s site). The original adjacency matrix was permuted to order the nodes based on their degree. Thus the core is represented by submatrix A which is quite dense. The submatrix B, here corresponds to how the rest of the network links to its core. The submatrix C is a very sparse matrix that consists of links between nodes in the long tail. Since C is quite sparse, it can be ignored without much degradation of the clustering/community detection results. Thus it leads to saving a significant amount of computation and storage. By utilizing just the core of the social network (matrix A) and how other nodes link to the core (matrix B) we can approximate the overall community structure of the entire graph, much more efficiently.

The rest boils down the to the mathematical formulation of the above idea using Spectral clustering techniques. You can read more about it in my poster paper that was recently accepted to ICWSM. (A Tech Report version with a more detailed analysis would be available shortly)

ICWSM early registration extended to 23:59 Monday 2/18

February 18th, 2008, by Tim Finin, posted in Blogging, Social media, Web, Web 2.0

The Second International Conference on Weblogs and Social Media (ICWSM 2008) will be held March 30 – April 2, 2008 at the Hilton in Seattle, Washington. The early registration deadline is Monday February 18. The program includes some great invited speakers: Bernardo Huberman (HP Labs), who will speak on “Social Dynamics in the Age of the Web,” David Sifry (Founder, Technorati, Sputnik, and Linuxcare), and Brad Fitzpatrick (Google, LiveJournal Founder). Two tutorials are planned, including “Subjectivity and Sentiment Analysis” by Jan Wiebe (Univ. of Pittsburgh) and “Graph Mining Techniques for Social Media Analysis” by Mary McGlohon and Christos Faloutsos (CMU). See the web site for details.

Reuters and the Semantic Web

February 10th, 2008, by Tim Finin, posted in NLP, Semantic Web, Web 2.0

Tim O’Reilly wrote in Reuters CEO sees “semantic web” in its future about Reuters’ motivations for embracing Semantic Web technology.

“At Money:Tech yesterday, I did an on-stage interview with Devin Wenig, the charismatic CEO-to-be of Reuters (following the still-not completed merger with Thomson). Devin highlighted what he considers two big trends hitting financial (and other professional) data: … The end of benefits from decreasing the time it takes for news to hit the market. … he increasingly sees Reuters’ job to be making connections, going from news to insight. He sees semantic markup to make it easier to follow paths of meaning through the data as an important part of Reuters’ future. … Ultimately, Reuters’ news is the raw material for analysis and application by investors and downstream news organizations. Adding metadata to make that job of analysis easier for those building additional value on top of your product is a really interesting way to view the publishing opportunity. If you don’t think of what you produce as the “final product” but rather as a step in an information pipeline, what do you do differently to add value for downstream consumers? In Reuters’ case, Devin thinks you add hooks to make your information more programmable.”

This provides some background for their recent announcement of the Reuters Calais information extraction service. It extracts named entities, events and relations from text and returns the information as RDF data.

Hypertable 0.9 alpha

February 8th, 2008, by Tim Finin, posted in Database, Semantic Web, Web, Web 2.0

hypertableHypertable 0.9 alpha is out.

“Hypertable is a high performance distributed data storage system designed to support applications requiring maximum performance, scalability, and reliability. Hypertable will be particularly invaluable to any organization that needs to manage rapidly evolving data to support demanding real-time applications. Modeled after Google’s well known Bigtable project, Hypertable is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures. Hypertable seeks to set the open source standard for highly available, petabyte scale, database systems. ” (link)

Update: LinuxWorld has an article, Zvents releases open-source cluster database, on the release along with a podcast with Doug Judd, principal search architect for Zvents.

Reuters Calais: free text to Semantic Web services

February 2nd, 2008, by Tim Finin, posted in NLP, OWL, RDF, Semantic Web, Social media, Web, Web 2.0

Reuters has released an API for its Calais Web service. The free service discovers entities, events and relations in text and returns the results in the form of RDF data. The services use information extraction technology from ClearForest, which Reuters acquired in April 2007.

“The Calais web service automatically attaches rich semantic metadata to the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations, etc.), facts (person ‘x’ works for company ‘y’), and events (person ‘z’ was appointed chairman of company ‘y’ on date ‘x’). The metadata results are stored centrally and returned to you as industry-standard RDF constructs accompanied by a Globally Unique Identifier (GUID). Using the Calais GUID, any downstream consumer is able to retrieve this metadata via a simple call to Calais.” (link)

The semantic types it recognizes and uses in its annotations are a basic set typical of information extraction systems and include entities, facts, events and categories. See, for example, the description of the person entity type. The brief API documentation describes how to call the web services and interpret the results. As an example of the semantic metadata types supported by Calais, a preprocessed a sample content set of about 350 Business and Economic news articles from WikiNews for the year 2007 is available.

The service is free for both commercial and non-commercial purposes with a limit, but a generous one, on the number of service calls a registered developer can make in a day. A sample Java application is available that reads input from STDIN, writes output to STDOUT and takes processing parameters from a configuration file.

    updates: The sample application requires Java 6 to run! Here’s an example of input and the RDF output.

Making such a service freely available on the Web has the potential to be a disruptive move. Reuters will sponsor “a number of contests and bounties for applications developed using the Calais API.” An initial “bounty” of $5,000 is offered for “A highly configurable plugin for WordPress that enriches a blog with several capabilities” based on OpenCalais.

The kind of content extraction that Calias does falls considerably short of full language understanding. However, it does represent the state of the art in scalable, domain-independent information extraction, is immediately useful, and an important step toward the ultimate goal of full NLP.

Twine in the New York Times

February 2nd, 2008, by Tim Finin, posted in GENERAL, Semantic Web, Social media, Web 2.0

Tomorrow’s New York Times has a very positive story on Twine in the business section, An Online Organizer That Helps Connect the Dots.

“How often have you wasted time searching through page after page of e-mail messages, Web sites, notes, news feeds and YouTube videos on your computer, trying to find an important item? If the answer is “too often,” a San Francisco company, Radar Networks, is testing a free, Web-based application, called Twine, that may provide some robotic secretarial help in organizing and retrieving documents.”

Happily, the story mentions that Twine is using Semantic Web technology:

“Twine is based on technologies created for the developing semantic Web — foreseen as a smarter Web where machines may someday be able to process the meaning of words and phrases in documents and even routinely answer direct questions.”

Google social graph API

February 2nd, 2008, by Tim Finin, posted in Blogging, Semantic Web, Social media, Web, Web 2.0

Late this week Google released the Google social graph API which provides structured access to information Google’s has extracted from public FOAF and XFN data on the Web. Google also says it mines the web for “and other publicly declared connections”. I wonder what that means? Brad Fitzpatrick gives a three minute explanation in this video. This is exciting and likely to give a push to any number of emerging themes, including data portability, linked data, and the Semantic Web in general. There’s lots of comment from the ususal suspects and also on the SWIG IRC

By the way, he will give an invited talk at the 2008 International Conference on Weblogs and Social Media at the end of March in Seattle.

Here’s a simple call to the API starting with the ebiquity blog

  http://socialgraph.apis.google.com/lookup?q=ebiquity.umbc.edu%2Fblogger%2F&fme=1&pretty=1

You can see from the results that they are returned using JSON. The possible parameters and what they mean are given here.

Duncan Watts on influence, tipping points and marketing

January 27th, 2008, by Tim Finin, posted in Social media, Web, Web 2.0

Fastcompany has a long article, Is the Tipping Point Toast?, on social-network researcher Duncan Watts, who’s on leave from his position as Professor of Sociology at Columbia and working for Yahoo Research. The article focuses on Watt’s challenges to the importance of “influentials” typified by Maclom Gladwell’s popular book, The Tipping Point.

“In the past few years, Watts–a network-theory scientist who recently took a sabbatical from Columbia University and is now working for Yahoo –has performed a series of controversial, barn-burning experiments challenging the whole Influentials thesis. He has analyzed email patterns and found that highly connected people are not, in fact, crucial social hubs. He has written computer models of rumor spreading and found that your average slob is just as likely as a well-connected person to start a huge new trend. And last year, Watts demonstrated that even the breakout success of a hot new pop band might be nearly random. Any attempt to engineer success through Influentials, he argues, is almost certainly doomed to failure.” link

According to the article, Watts work at Yahoo Research is refining the concept of big-seed marketing that he and Jonah Peretti proposed in a note in HBR, Viral Marketing for the Real World. The idea is to marry “viral-marketing tools with old-fashioned mass media in a way that yields far more predictable results than “purely” viral approaches like word-of-mouth marketing”.

You are currently browsing the archives for the Web 2.0 category.

  Home | Archive | Login | Feed