UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
11 May 2008, 19:31:33 EDT  
Web 2.0

Archive for the 'Web 2.0' Category

Fonolo is google for phone menus

April 30th, 2008, by Tim Finin, posted in sEARCH, Social media, Web 2.0, Mobile Computing, GENERAL

Remember when finding information on the Web was done by navigation using Gopher or Yahoo’s directory? I worked and we thought it was pretty good, at least until the search engines came along. Then we realized that search was much better than navigation for most tasks, especially as the size of the Web grew.

Recall how we get information from a big organization by phone today — we call customer service and navigate a confusing phone menu over the phone and after 10 minutes, end up being told to dial a different department. Dealing with such IVR (Interactive voice response) systems is part of the cost of living in our modern society. But maybe w can do better…

Fonolo offers a service that uses a search engine on their site to find the right spot on a company’s phone menu and connect you to it by a callback to your phone. You can even bookmark the point on the phone menu.

How do they do this? Here’s an explanation from IVR search: a ‘Google’ for phone menus?, a post on Telco2.0:

“And Fonolo wrote a web spider that visits large companies’ public phone numbers, and iterates through all the options on all the IVR menus from all the numbers, logging everything it finds. Then it’s just a matter of plotting it all on a directed graph, and making the whole thing searchable and available on the Web. And then the bit we like. You click on the bit you want to get through to, and their system uses the map to dial and navigate the IVRs for you, thus “deep dialing” the user directly to the point in the IVR they need. Every time someone dials through Fonolo, they use the interaction to re-validate that path through the IVR. The search terms that users submit tell them which companies they need to go spider.”

Fonolo is in a private beta mode, but you can sign up to be added to it on thei web site. You can see a video presentation of the idea and some ppt slides

Environmental detection/protection.

April 7th, 2008, by joel, posted in Social media, Ecoinformatics, Web 2.0, Blogging, Semantic Web, GENERAL

EPA is on a web 2.0 kick. They sponsored a 2-day monster mashup exercise last Fall, the Puget Sound Information Challenge, and are making plans for further efforts. EPA’s CIO Molly O’neill talks a little about it here.

They’ve also been tracking and flirting with the semantic web, and are wondering how much effort to expend on a more full-on semantic engagement. I presented our semantic eco-blogging work at EPA headquarters in February, and was surprised at the turnout and enthusiasm. In response to a screen shot of a Fieldmarking post describing beach closings, a person from the Water Office related that he learned of the closing of his favorite Lake Erie swim-spot from a blog post. This made an impression on him, since, by rights, the closing should have been reported at the county level, up to the state level, and, ultimately, to his office in DC. It struck him that EPA should be systematically tapping the blogosphere for citizen sentiment and concern.

If they to do this, they will, implicitly, be saying to the citizenry “If you can’t be bothered to fill out the right form in the right office, at least blog about it, and maybe the machinery of the blogosphere will direct your thoughts our way.” I kind of like that. (This particular example - finding information on beach closings in a given area - can probably be done fairly efficiently with Yahoo pipes).

EPA will be hosting this week’s meeting of the multilateral ecoinformatics cooperation, and there will be participation from a wide swathe of EPA - I’m curious to learn of their plans.

Call for ISWC 2008 Research Papers

March 6th, 2008, by Tim Finin, posted in iswc, Social media, Web 2.0, Web, Semantic Web

The call for ISWC 2008 research papers for the Seventh International Semantic Web Conference is online. The track is co-chaired by Amit Sheth and Steffen Staab and has nineteen distinguished vice chairs and an program committee of experienced experts. Key dates for the research track are:

  • Abstracts due by 9 May 2008
  • Submissions due before 16 May 2008
  • Rebuttal phase during 14-16 June 2008
  • Notification sent by 11 July 2008
  • Camera ready due before 15 August 2008

Join the ICWSM community on CrowdVine

February 26th, 2008, by Tim Finin, posted in Social media, Web 2.0, Web

We invite you to join the ICWSM 2008 social networking community site hosted by CrowdVine. ICWSM 2008 is the Second International Conference on Weblogs and Social Media which will take place in Seattle between March 30 and and April 2. If you are coming to ICWSM next month, you can use this site to help plan and shape the event, facilitate finding and connecting with people at the conference, and share your ideas and comments. If you aren’t able to make it to Seattle, it will provide a way for you to engage even though you can’t be there. Joining the ICWSM community on CrowdVine is easy and free, so please check it out.

Wisdom of the crowd control?

February 24th, 2008, by Tim Finin, posted in Wikipedia, Social media, Web 2.0, Web

Slate has an interesting article, The Wisdom of the Chaperones — Digg, Wikipedia, and the myth of Web 2.0 democracy, that explores who controls some of the popular social media sites. It turns out that the social web is more hegemonic than we thought.

wikipedia hegemony

“Social-media sites like Wikipedia and Digg are celebrated as shining examples of Web democracy, places built by millions of Web users who all act as writers, editors, and voters. In reality, a small number of people are running the show. According to researchers in Palo Alto, 1 percent of Wikipedia users are responsible for about half of the site’s edits. The site also deploys bots—supervised by a special caste of devoted users—that help standardize format, prevent vandalism, and root out folks who flood the site with obscenities. This is not the wisdom of the crowd. This is the wisdom of the chaperones.” (link)

The work cited is by the Augmented Social Cognition research group at PARC. See, for example, their post on the behavior of the most active Wikipedians. Very interesting.

I think it’ even worse, in many ways, on Digg, which the article also discusses.

“The same undemocratic underpinnings of Web 2.0 are on display at Digg.com. Digg is a social-bookmarking hub where people submit stories and rate others’ submissions; the most popular links gravitate to the site’s front page. The site’s founders have never hidden that they use a “secret sauce”—a confidential algorithm that’s tweaked regularly—to determine which submissions make it to the front page. Historically, this algorithm appears to have favored the site’s most active participants. Last year, the top 100 Diggers submitted 44 percent of the site’s top stories. In 2006, they were responsible for 56 percent.” (link)

Will rule by the few always be the case? Who knows. The article does point out that the moderation system used by Slashdot helps to broaden the elite and also describes a simple “write one, rate two” policy used by Helium, a site new to me. Helium is a community for freelance writers that helps them connect with publishers who will pay for articles on their topics. The publishers are vetted, so students seeking to buy term papers will have to look elsewhere.

How to use XFN (XML Friends Network)

February 21st, 2008, by Tim Finin, posted in Social media, Web 2.0, Semantic Web

Brian Suda has a good, practical article on XFN on opera.dev — XFN encoding, extraction, and visualizations.

“In this article I will take a good look at XFN - the microformat for describing relationships between people. I will look briefly at what it is and the basic markup needed to add the information to your sites, before then going into depth, looking at the benefits you can get from that data by extracting it and using it in different ways.”

He covers the how and why of XFN and has good examples and code fragments. FOAF is only mentioned once in passing, however..

Approximating the Community Structure of the Long Tail

February 18th, 2008, by Akshay Java, posted in Social media, Web 2.0, Web, Machine Learning, Semantic Web

Social Networks and Web graphs exhibit certain typical properties. The classic work by Barabási–Albert showed how nodes in such network link preferentially — popular nodes often gain disproportionately larger share of the links. This is also known in other fields as the 80/20 rule or simply the “rich get richer phenomenon“. Another early work by Steve Borgatti studied social networks and found that they exhibit a core-periphery property. A small set of (popular) nodes form the core and the rest comprise of the peripheral nodes. To the best of my knowledge, community detection algorithms have often worked independent of such underlying network properties.

I have been exploring an idea that can utilize the core-periphery structure of social networks to approximately compute the communities in the graph. The intuition behind this method is really quite simple. The basic idea boils down to the following:

“The core of the social network typically defines the communities present in it. By looking at the link structure of the core and identifying how the rest of the network connects to the core we can efficiently compute communities in large graphs.”

This idea can be easily explained by considering the following network of email communication (obtained from Dr. Mark Newman’s site). The original adjacency matrix was permuted to order the nodes based on their degree. Thus the core is represented by submatrix A which is quite dense. The submatrix B, here corresponds to how the rest of the network links to its core. The submatrix C is a very sparse matrix that consists of links between nodes in the long tail. Since C is quite sparse, it can be ignored without much degradation of the clustering/community detection results. Thus it leads to saving a significant amount of computation and storage. By utilizing just the core of the social network (matrix A) and how other nodes link to the core (matrix B) we can approximate the overall community structure of the entire graph, much more efficiently.

The rest boils down the to the mathematical formulation of the above idea using Spectral clustering techniques. You can read more about it in my poster paper that was recently accepted to ICWSM. (A Tech Report version with a more detailed analysis would be available shortly)

ICWSM early registration extended to 23:59 Monday 2/18

February 18th, 2008, by Tim Finin, posted in Social media, Web 2.0, Blogging, Web

The Second International Conference on Weblogs and Social Media (ICWSM 2008) will be held March 30 - April 2, 2008 at the Hilton in Seattle, Washington. The early registration deadline is Monday February 18. The program includes some great invited speakers: Bernardo Huberman (HP Labs), who will speak on “Social Dynamics in the Age of the Web,” David Sifry (Founder, Technorati, Sputnik, and Linuxcare), and Brad Fitzpatrick (Google, LiveJournal Founder). Two tutorials are planned, including “Subjectivity and Sentiment Analysis” by Jan Wiebe (Univ. of Pittsburgh) and “Graph Mining Techniques for Social Media Analysis” by Mary McGlohon and Christos Faloutsos (CMU). See the web site for details.

Reuters and the Semantic Web

February 10th, 2008, by Tim Finin, posted in Web 2.0, NLP, Semantic Web

Tim O’Reilly wrote in Reuters CEO sees “semantic web” in its future about Reuters’ motivations for embracing Semantic Web technology.

“At Money:Tech yesterday, I did an on-stage interview with Devin Wenig, the charismatic CEO-to-be of Reuters (following the still-not completed merger with Thomson). Devin highlighted what he considers two big trends hitting financial (and other professional) data: … The end of benefits from decreasing the time it takes for news to hit the market. … he increasingly sees Reuters’ job to be making connections, going from news to insight. He sees semantic markup to make it easier to follow paths of meaning through the data as an important part of Reuters’ future. … Ultimately, Reuters’ news is the raw material for analysis and application by investors and downstream news organizations. Adding metadata to make that job of analysis easier for those building additional value on top of your product is a really interesting way to view the publishing opportunity. If you don’t think of what you produce as the “final product” but rather as a step in an information pipeline, what do you do differently to add value for downstream consumers? In Reuters’ case, Devin thinks you add hooks to make your information more programmable.”

This provides some background for their recent announcement of the Reuters Calais information extraction service. It extracts named entities, events and relations from text and returns the information as RDF data.

Hypertable 0.9 alpha

February 8th, 2008, by Tim Finin, posted in Database, Web 2.0, Web, Semantic Web

hypertableHypertable 0.9 alpha is out.

“Hypertable is a high performance distributed data storage system designed to support applications requiring maximum performance, scalability, and reliability. Hypertable will be particularly invaluable to any organization that needs to manage rapidly evolving data to support demanding real-time applications. Modeled after Google’s well known Bigtable project, Hypertable is designed to manage the storage and processing of information on a large cluster of commodity servers, providing resilience to machine and component failures. Hypertable seeks to set the open source standard for highly available, petabyte scale, database systems. ” (link)

Update: LinuxWorld has an article, Zvents releases open-source cluster database, on the release along with a podcast with Doug Judd, principal search architect for Zvents.

Reuters Calais: free text to Semantic Web services

February 2nd, 2008, by Tim Finin, posted in Web 2.0, Social media, OWL, RDF, Web, NLP, Semantic Web

Reuters has released an API for its Calais Web service. The free service discovers entities, events and relations in text and returns the results in the form of RDF data. The services use information extraction technology from ClearForest, which Reuters acquired in April 2007.

“The Calais web service automatically attaches rich semantic metadata to the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations, etc.), facts (person ‘x’ works for company ‘y’), and events (person ‘z’ was appointed chairman of company ‘y’ on date ‘x’). The metadata results are stored centrally and returned to you as industry-standard RDF constructs accompanied by a Globally Unique Identifier (GUID). Using the Calais GUID, any downstream consumer is able to retrieve this metadata via a simple call to Calais.” (link)

The semantic types it recognizes and uses in its annotations are a basic set typical of information extraction systems and include entities, facts, events and categories. See, for example, the description of the person entity type. The brief API documentation describes how to call the web services and interpret the results. As an example of the semantic metadata types supported by Calais, a preprocessed a sample content set of about 350 Business and Economic news articles from WikiNews for the year 2007 is available.

The service is free for both commercial and non-commercial purposes with a limit, but a generous one, on the number of service calls a registered developer can make in a day. A sample Java application is available that reads input from STDIN, writes output to STDOUT and takes processing parameters from a configuration file.

    updates: The sample application requires Java 6 to run! Here’s an example of input and the RDF output.

Making such a service freely available on the Web has the potential to be a disruptive move. Reuters will sponsor “a number of contests and bounties for applications developed using the Calais API.” An initial “bounty” of $5,000 is offered for “A highly configurable plugin for WordPress that enriches a blog with several capabilities” based on OpenCalais.

The kind of content extraction that Calias does falls considerably short of full language understanding. However, it does represent the state of the art in scalable, domain-independent information extraction, is immediately useful, and an important step toward the ultimate goal of full NLP.

Twine in the New York Times

February 2nd, 2008, by Tim Finin, posted in Social media, Web 2.0, Semantic Web, GENERAL

Tomorrow’s New York Times has a very positive story on Twine in the business section, An Online Organizer That Helps Connect the Dots.

“How often have you wasted time searching through page after page of e-mail messages, Web sites, notes, news feeds and YouTube videos on your computer, trying to find an important item? If the answer is “too often,” a San Francisco company, Radar Networks, is testing a free, Web-based application, called Twine, that may provide some robotic secretarial help in organizing and retrieving documents.”

Happily, the story mentions that Twine is using Semantic Web technology:

“Twine is based on technologies created for the developing semantic Web — foreseen as a smarter Web where machines may someday be able to process the meaning of words and phrases in documents and even routinely answer direct questions.”

You are currently browsing the archives for the Web 2.0 category.

  Home | Archive | Login | Feed

Recent posts

  • Students: brand yourself with a blog
  • Social Data on the Web workshop at ISWC 2008
  • Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBC
  • Gossip-Based Outlier Detection for Mobile Ad Hoc Networks
  • Int. Conf. Semantic Web deadlines this week and next (ISWC 2008)

  • Ebiquity community

  • Fieldmarking data blog
  • Geospatial Semantic Web
  • Harry Chen thinks aloud
  • Planet social media research
  • Social media research blog
  • TrackForward by Kolari
  • UMBC GAIM

  • UMBC