Web of Data, Services and Identities

April 18th, 2009

ReadWriteWeb has a post up on The Web of Data: Creating Machine-Accessible Information that focuses on Linked Open Data.

“In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable, respectively. In this post, we will look at the first of these Webs (of Data) and see how making information accessible to machines will transform how we find information.”

I did find the three ‘Webs’ mentioned in their into — data, services and identity providers — to be interesting. The first two are standard components of the envisioned future Web but their third, a web of identity providers, less so. I am unsure its meant to refer to authentication services and protocols (e.g., oauth) or maybe some kind of named entity recognition services from text. The former is certainly necessary for web services and APIs to work more seamlessly, but doesn’t seem to me to be as significant a problem as developing highly interoperable and integrable Webs of data and services. Of course, I am probably unaware of the subtleties involved in getting this right while maintaing security and appropriate privacy. In any case, I look forward to the articles to follow.

Twitter-Calais mashup tracks IL-5 election buzz

February 24th, 2009

WindyCitizen.com is “a crowd-powered front page for the Windy City” that “brings Chicagoans the best of the local web by letting them share, rate and discuss their favorite local news, photos, videos and more.”

Their Windy City Twitter Tracker mashup uses Open Calais as a named entity recognizer to track Tweets about candidates in the special election to fill the US House seat for Chicago’s 5th district that that Rahm Emanuel vacated. Calais might be overkill for this, since there is a small set of known candidates, but it’s an impressive semantic mashup nonetheless.

“We’re searching Twitter constantly to keep you up to date with the conversation about the IL-5 special election. The graph above lets you track buzz about the candidates over the last two weeks.”

The Windy City Twitter Tracker is probably written to be easily repurposed, judging from the Web site, which describe it as currently tracking the “Race for the 5th”. The mashup is credited to Whattech.

Amazon Remembers: See it, snap it, buy it

December 3rd, 2008

Just in time for Christmas, Amazon has released a new service via an iPhone app that let’s you snap a photo of an object you desire and sometime later in the day find out how you can buy one like it on Amazon.

Here’s how the NYT Bits blog described it in Amazon.com Invades the Apple App Store:

“There is, however, one unusual and noteworthy aspect of the app called Amazon Remembers, which Amazon is calling “experimental.” The tool lets users take a photograph of any product they see in the real world. The photos are then uploaded to Amazon and turned over to the far-flung freelance workers in Amazon’s Mechanical Turk program, who will try to match them with products for sale on Amazon.com. The results will not be instantaneous (between 5 minutes and 24 hours, the company says), but the idea is to entice consumers to buy products from Amazon instead of its offline rivals.”

Too bad we are in a recession depression.

Jon Kleinberg named as one of 20 Best Brains Under 40 by Discover Magazine

November 28th, 2008

Discover magazine has named Jon Kleinberg as one of the 20 Best Brains Under 40 for his work on HITS and social networks.

“In the mid-1990s a Web search for, say, “DISCOVER magazine” meant wading through thousands of results presented in a very imperfect order. Then, in 1996, 24-year-old Jon Kleinberg developed an algorithm that revolutionized Web search. That is why today, that same search lists this magazine’s home page first. Kleinberg, now 37, created the Hyperlink-Induced Topic Search algorithm, which estimates a Web page’s value in both authority (quality of content and endorsement by other pages) and hub (whether it links to good pages).

Kleinberg continues to combine computer science, data analysis, and sociological research to help create better tools that link social networking sites. He envisions an increase in how we can see information move through space over time, in what he calls geographic hot spots on the Web, based on the interests of a particular region.

Our social network links and friendships depend on these geographic hot spots, Kleinberg says, which makes searching easier by “taking into account not just who and when, but where.” He is now studying how word-of-mouth phenomena like fads and rumors flow through groups of people, hoping to apply this knowledge to processes such as political mobilization.”

Neologism Web-based RDFS vocabulary editor

November 27th, 2008

Neologism is a simple web-based RDF Schema vocabulary editor and publishing system under development at DERI. It looks like a great lightweight tool for developing Semantic Web vocabularies and publishing them on the Web following current best practices. It’s goal is to “dramatically reduce the time required to create, publish and modify vocabularies for the Semantic Web.” The system is not yet open for use, but there is a good online Neologism demo as well as a screencast of how to use it.

Semantic Applications at age one

November 19th, 2008

After a year, Read/Write Web has revisited their review of 10 promising Semantic Web apps, producing 10 Semantic Apps to Watch – One Year Later.

“A lot can happen in one year on the Internet, so we thought we’d check back in with each of the 10 products and see how they’re progressing. What’s changed over the past year and what are these companies working on now? The products are, in no particular order: Freebase, Powerset, Twine, AdaptiveBlue, Hakia, Talis, TrueKnowledge, TripIt, Calais (was ClearForest), Spock.”

They plan to publish a completely new list of Semantic applications to watch as the next post in the series and ask people to leave suggestions in the post comments.

Maybe Read/Write Web will do like Michael Apted’s 7up series and report back to us on how the systems are doing each year, which I guess may be like seven Web-years.

3scale provides infrastructure of the programmable web

November 19th, 2008

3scale provides infrastructure for the programmable web3scale Networks is a Barcelona-based startup that is trying to fill a critical gap in helping organizations manage web services as a business or at least in a business-like manner.

“3scale provides a new generation of infrastructure for the web – point and click contract management, monitoring and billing for Web Services. The 3scale platform makes it easy for providers to launch their APIs, manage user access and, if desired, collect usage fees. Service users can discover services they need and sign up for plans on offer.” (source)

They have been operating a private beta system for a few months and just announced that their public beta is open. Currently signing up with 3scale and registering services is free and the only costs are commissions on transaction fees your service charges. Once you’ve registered a service, you can install one of several 3scale plugins for your programming environment to get your service talking to 3scale and configure one or more usage plans. 3scale uses Amazon’s EC2, S3 and Cloud Computing services.

3scale’s co-founder and technical lead is Steve Wilmott, who we worked with for many years when he was an academic doing research on multiagent systems. Several months ago he invited us to add Swoogle’s web service to 3scale’s private beta. We were please with how easy it was and look forward to exploring how else to use 3scale.

A story in yesterday’s Washington Post, Manage Your API Infrastructure With 3scale Networks, has some more information.

The Google does a UMBC drive by

November 7th, 2008

Google Maps has added street views of the greater Baltimore area. One thing I had never noticed before (I think it is new) is that if you click on the STREET VIEW button, the roads from which street view is available are marked in blue. This makes it easy to zoom out and get a sense of the coverage. See for example the street view coverage of the

    You can see that the Google just did a quick drive-by of UMBC.

    I also noticed that you can expand the street view to “full screen” and drive around interesting areas, like nearby main street in old ellicott city.

Akshay Java on Mining Social Media Communities and Content

October 14th, 2008

Akshay Java will defend his dissertation, Mining Social Media Communities and Content, at 10:30am this Thursday in ITE 325. Here’s the abstract.

Social Media is changing the way we find information, share knowledge and communicate with each other. The important factor contributing to the growth of these technologies is the ability to easily produce “user-generated content”. Blogs, Twitter, Wikipedia, Flickr and YouTube are just a few examples of Web 2.0 tools that are drastically changing the Internet landscape today. These platforms allow users to produce, annotate and share information with their social network. Their combined content accounts for nearly four to five times that of edited text being produced each day on the Web. Given the vast amount of user-generated content and easy access to the underlying social graph, we can now begin to understand the nature of online communication and collaboration in social applications. This thesis presents a systematic study of the social media landscape through the combined analysis of its special properties, structure and content.

First, we have developed techniques to effectively mine content from the blogosphere. The BlogVox opinion retrieval system is a large scale blog indexing and content analysis engine. For a given query term, the system retrieves and ranks blog posts expressing sentiments (either positive or negative) towards the query terms. We evaluate the system on a large, standard corpus of blogs with available human verified, relevance assessments for opinions. Further, we have developed a framework to index and semantically analyze syndicated feeds from news websites. This system semantically analyzes news stories and build a rich fact repository of knowledge extracted from real-time feeds.

Communities are an essential element of social media systems and detecting their structure and membership is critical in several real-world applications. Many algorithms for community detection are computationally expensive and generally, do not scale well for large networks. In this work we present an approach that benefits from the scale-free distribution of node degrees to extract communities efficiently. Social media sites frequently allow users to provide additional meta-data about the shared resources, usually in the form of tags or folksonomies. We have developed a new community detection algorithm that can combine information from tags and the structural information obtained from the graphs to detect communities. We demonstrate how structure and content analysis in social media can benefit from the availability of rich meta-data and special properties.

Finally, we study social media systems from the user perspective. We present an analysis of how a large population of users subscribes and organizes the blog feeds that they read. It has revealed several interesting properties and characteristics of the way we consume information. With this understanding, we describe how social data can be leveraged for collaborative filtering, feed recommendation and clustering. Recent years have seen a number of new social tools emerge. Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. We present our observations of the microblogging phenomena and user intentions by studying the content, topological and geographical properties of such communities.

The course of this study spans an interesting period in Web’s history. Social media is connecting people and building online communities by bridging the gap between content production and consumption. Through our research, we have highlighted how social media data can be leveraged to find sentiments, extract knowledge and identify communities. Ultimately, this helps us understand how we communicate and interact in online, social systems.

Aim for the fat part of the tail

July 18th, 2008

I like this image from Seth Godin’s post on The Long Tail and the Dip.

Profit pockets on the tail

He argues that shooting for the head (#1) takes too much effort and money, but it doesn’t mean you have to resign yourself to life out on the thin part of the long tail (#3) — you can aim for “profit pocket #2”.

HealthMap mines text for a global disease alert map

July 8th, 2008

HealthMap is an interesting Web site that displays a “global disease alert map” based on information extracted from a variety of text sources on the Web, including news, WHO and NGOs. HealthMap was developed as a research project by Clark Freifeld and John Brownstein of the Children’s Hospital Informatics Program, part of the Harvard-MIT Division of Health Sciences & Technology.

HealthMap mines text for a global disease alert map

Their site says

“HealthMap brings together disparate data sources to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. This freely available Web site integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as World Health Organization). Through an automated text processing system, the data is aggregated by disease and displayed by location for user-friendly access to the original alert. HealthMap provides a jumping-off point for real-time information on emerging infectious diseases and has particular interest for public health officials and international travelers.”

The work was done in part with support from Google, as described in a story on ABC news, Researchers Track Disease With Google News, Google.org Money

Twitterment, domain grabbing, and grad students who could have been rich!

July 8th, 2008

Here at Ebiquity, we’ve had a number of great grad students. One of them, Akshay Java, hacked out a search engine for twitter posts around early April last year, and named it twitterment. He blogged about it here first. He did it without the benefit of the XMPP updates, by parsing the public timeline. It got talked about in the blogosphere, (including by Scoble), got some press, and there was an article in the MIT Tech review that used his visualization of some of the twitter links. It even got talked about in Wired’s blog, something we found out only yesterday. We were also told that three days after the post in Wired’s blog, someone somewhere registered the domain twitterment.com (I won’t feed them pagerank by linking!), and set up a page that looks very similar to Akshay’s. It has Google Adsense, and of course just passes the query to Google with a site restriction to twitter. So they’re poaching coffee and cookie money from the students in our lab 🙂

So of course we played with Akshay’s hack, hosted it on one of our university boxes for a few months, but didn’t really have the bandwidth or compute (or time) resources to keep up. Startups such as summize appeared later and provided similar functionality. For the last week or two we’ve  been moving the code of twitterment to Amazon’s cloud to restart the service. Of course, today comes the news that twitter might buy summize, quasi confirmed by Om Malik. Lesson to you grad students — if you come up with something clever, file an invention disclosure with your university’s tech transfer folks. And don’t listen to your advisors if they think that there isn’t a paper in what you’ve hacked — there may yet be a few million dollars in it 🙂