 | 2007 March 
Archive for March, 2007
March 12th, 2007, by Tim Finin, posted in Uncategorized
What are Freebase’s data and knowledge models?
There has been a lot of discussion about Metaweb’s Freebase, mostly centered about whether it has a centralized or distributed model. While that’s an interesting and significant question, there are other important ones, perhaps even more important.
Are there any details about the data and knowledge models that Freebase is using or will use? I’ve not seen any information or even any speculation.
Take the underlying data model — it could be relational (~SQL), object oriented (~Google Base), FOL based (~Common Logic), graph oriented (~RDF), tree based (XML), or something else. On top of that we might have a familiar knowledge model, something different, or no real knowledge model at all. Can it handle uncertainty? How about procedures? We’ll all (well, some of us) be disappointed if it’s a Wikified version of Google Base. That would be very interesting, but wouldn’t address many issues that the Semantic Web is facing.
Freebase has a FAQ but it’s behind their registration screen, unfortunately.
Does anyone know of any descriptions, published or informal? How about guesses based on the backgrounds of Metaweb’s technical people?
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
March 12th, 2007, by Tim Finin, posted in Uncategorized
The current issue of Technology Review has a long article on the Semantic Web, A Smarter Web, with the theme
New technologies will make online search more intelligent–and may even lead to a “Web 3.0.”
The article tells the Semantic Web story using Eric Miller’s involvement as a thread and mentions many other people and companies along the way.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 11th, 2007, by Akshay Java, posted in Uncategorized
TinyURL is a great service for shortening long URLs. This is particularly helpful on Twitter, where there is a character limit on the messages, due to the fact that many people use Twitter via SMS. In such situations TinyURL makes it easy to share links with friends. As described on Wikipedia:
Short URL aliases are seen as useful because they’re easier to write down, remember or pass around, are less error-prone to write, and also fit where space is limited such as IRC channel topics or email signatures. Also some email clients impose a maximum length at which they automatically break lines requiring the user to paste together a long URL rather than just clicking on it. A short URL alias is much less likely to become broken.
In fact, with the growth of micro-blogging and mobile applications, TinyURL seems to be an important service that is sure to become even more popular. I have found that, from the sample of logs of the public timeline in Twitter, about 68% of the URLs used by “twitterers” were resolved via TinyURL.
While this is a great service, one thing that might be useful is to have a TinyURL API. As David Berlind puts it in his post, right now we are left “Desperately seeking: an API for TinyURL” :
Add an API to the service and let the growing contingent of mashup developers take over from there. That way, the TinyURL capability becomes equally available to other API enabled services and there’s no telling what sort of innovation will blossom out of some developers’ heads.
Thats quite true! I am hacking up a few toys (Mashups) for twitter and it would really help if I can have an API for TinyURL. I think its time we have an XML-RPC/SOAP interface to this great service! (Personally, It just does not seem right to screen-scrape in this day and age of Web 2.0)
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
March 10th, 2007, by Tim Finin, posted in Uncategorized
Tagged under for what its worth…
Traders on the intrade prediction market think it’s much more likely that I. Lewis (Scooter) Libby will receive a pardon by the end of President Bush’s term than by the end of 2007. The market for a 2007 pardon closed this week at 16 whereas the price for a 2008 pardon closed at 64.1.

Of course some of the difference is that the second future includes the first. But the difference is fourfold, which is significant.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 9th, 2007, by Tim Finin, posted in Uncategorized
John Markoff writes on Freebase in Start-Up Aims for Database to Automate Web Searching in today’s New York Times. Freebase is “open, shared database of the worlds knowledge” being developed by Metaweb, a Silicon Valley startup founded by Danny Hillis and others and recently funded with $15M from VC firms.
A new company founded by a longtime technologist is setting out to create a vast public database intended to be read by computers rather than people, paving the way for a more automated Internet in which machines will routinely share information. The company, Metaweb Technologies, is led by Danny Hillis, whose background includes a stint at Walt Disney Imagineering and who has long championed the idea of intelligent machines. (source)
Tim O’Reilly has seen early demonstrations and likes the idea.
“It’s name is appropriate for many reasons. Yes, it is a free database, it is addictive, and its name is overloaded with multiple meanings, just like so many things we try to make sense of. But we have the ability to disambiguate those meanings, and to take them both in, with the overtones and conflicts actually giving additional meaning. Metaweb still has a long way to go, but it seems to me that they are pointing the way to a fascinating new chapter in the evolution of Web 2.0.” (source)
The idea of developing of a vast repository of sharable data intended primarily to accessed by programs is, of course, not new. It’s the goal of the W3C’s Semantic Web effort
He says his latest effort, to be announced Friday, will help develop a realm frequently described as the “semantic Web†— a set of services that will give rise to software agents that automate many functions now performed manually in front of a Web browser. (source)
But there are differences. Fundamental ones.
The idea of a centralized database storing all of the world’s digital information is a fundamental shift away from today’s World Wide Web, which is akin to a library of linked digital documents stored separately on millions of computers where search engines serve as the equivalent of a card catalog. (source)
While some of Feebase’s data will be harvested from sources like MusicBrainz and Wikipedia, much of it is planned to be added by users. I assume that this means that users will also be able to define or extend the underlying conceptual schemas or ontologies.
I’m quite interested to see what approach Freebase is taking to a common data model and to building and maintaining shares ontologies, vocabularies and schemas. The W3C’s RDF approach is, IMHO, a sound one that can work, although it has both fans and detractors. Another open approach is for knowledge sharing is Common Logic. You can sign up at freebase.com for a membership invitation. Having done so, I hope to receive it soon so I can find out how this is supposed to work.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 8th, 2007, by Akshay Java, posted in Uncategorized
Lately, we have been bitten by twitter bug. Well, I just had an obvious thought and somebody would have come up with it eventually. So incase no one has – here is my Twitter Google Maps mashup. It updates every minute and maps anybody in the public timeline who have set their location. It is interesting to see how Twitter has become a global phenomenon.
http://geotwitter.org
Njoy.
Edit | Bookmark@del.icio.us | Trackback | 11 Comments »
March 8th, 2007, by Tim Finin, posted in Uncategorized
The Economist has a special report, The rise and fall of corporate R&D — Out of the dusty labs, (free to non-subscribers), that discusses the current state of industrial R&D with an emphasis on IT firms. The basic message of the article is that the dominant model of the past 50+ years (articulated Science The Endless Frontier) which separated relatively pure research from development is dead.
“Modern technology firms are much less vertically integrated. They use networks of outsourced suppliers and assemblers, which has led to the splintering of research divisions. Even though big American firms still spend billions of dollars on R&D, none has any intention of filling the shoes left empty by Bell Labs or Xerox PARC. The research and development that Bush tore asunder are once again becoming entwined. Old-fashioned R&D is losing its ampersand. ”The lesson learnt is that you don’t isolate researchers,†says Eric Schmidt, the boss of Google, who started his career as a computer scientist at Bell Labs and later at Xerox PARC. The “smart people on the hill†method no longer works, he adds. Instead, researchers have become intellectual mercenaries for product teams: they are there to solve immediate needs. This view is shared by other industry veterans. “The corporate research labs of the old days are really not going to be the basis of what is new,†says John Seely Brown, the director of Xerox PARC for over a decade until 2000. “This is getting to be a new kind of game.â€
It’s a good article worth reading by all of us who are in, or expect to be in, the R&D business.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 7th, 2007, by Tim Finin, posted in Uncategorized
South Korea to make robot abuse illegal, as in abuse both by and of robots.
A story in the BBC, Robotic age poses ethical dilemma, describes a government-backed effort to develop an ethical code for robots.
An ethical code to prevent humans abusing robots, and vice versa, is being drawn up by South Korea. The Robot Ethics Charter will cover standards for users and manufacturers and will be released later in 2007. It is being put together by a five member team of experts that includes futurists and a science fiction writer.
All of these efforts are based on, or at least pay homage to, Asimov’s three laws of robotics. The new Korean effort is also studying the Roboethics Roadmap developed in conjunction with the European Robotics Research Network.
While my first thought when seeing the BBC article was that it was fluff, looking into it a bit more convinces me that there are some real issues. An article in the Telegraph quotes a South Korean official
Even without the development of “conscious” robots capable of making their own decisions, Park Hye-Young, a ministry official, said it was necessary to consider issues such as data protection and security.
Although the next quote pushed me back into the ‘this is fluff’ territory
But she stressed that the code would also affect human attitudes to robots. “Imagine if some people treat androids as if the machines were their wives,” she said. “Others may get addicted to interacting with them, just as many Internet users get hooked to the cyberworld.”
With more autonomous and semi-autonomous machine around vacuuming our carpets and protecting our borders, some thought needs to be given to what to do when things go wrong.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 6th, 2007, by Pranam Kolari, posted in Uncategorized
The growth of Twitter has been phenomenal over the last couple of months. While its utility is argued by some, current traction suggests this could be another Web 2.0 winner.
Though we were initially circumspect (as were many others), we decided to take the plunge last week. See what we are upto now on our blog sidebar, or follow us directly at twitter.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 6th, 2007, by Akshay Java, posted in Uncategorized
A few days ago Dr. Finin, Pranam and I were having a discussion about how tags are a pretty good representation of a person or an organization. A folksonomy is a simple, yet effective way to organize information around yourself, be it blogs published, feeds read or a collection of family pictures. A tag cloud is a culmination of the context a person lives in.
For instance, on the eBiquity site, we tag pretty much everything and this produces a nice tag cloud (also shown in the image). A quick glance at it can give a good picture of the kind of work we do. Similarly, here are a few snapshots of tag clouds from famous bloggers: Steve Rubel (from his site), Michael Arrington (TechCrunch’s tag cloud from Technorati) and Arianna Huffington( Huffington Post from Technorati). It was interesting to find that not many A-listers use the delicious tagroll. In contrast to tags in posts, delicious tags are a good way to measure what one reads. Thus, together, the tagspace one uses represents what we read, write and think. We are what we do and so in a sense “Tags are US!” On a related note, one of the research areas I am studying is modeling influence on the blogosphere. We are planing on exploring how knowing a person’s ‘tag cloud’ can help in modeling the topics they are interested in. We’d like to acquire a collection of paired blogs and their author’s tag clouds and seek people who would be willing to contribute their information (i.e., blog URL and public tag URL) the collection.
We will use the information collected for research only. We will not use it for any commercial purposes. We will not disclose the URLs or their associated names in any publications. If you are willing please consider submitting the your blog and delicious URL by filling in this form.
The form allows you to check off whether you are willing to allow us to include the information in a dataset to be shared with researchers If you choose not to let us share the data, we will use it in our own internal research and neither make it part of any shared dataset nor share it with anyone outside of our research group.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 5th, 2007, by Tim Finin, posted in Uncategorized
The Semantic Web rests on a foundation or URIs that are used to denote things. It’s an idea that’s simple to understand and to implement. Or so one would think. After all, “What’s in a name? that which we call a rose by any other name would smell as sweet.”
It’s left as an exercise to the reader, unfortunately, to figure out how to choose good URIs for her resources and external objects. Once one chooses and start using a URI, it should not change, so it’s important to do it right the first time. There are some complicated technical details like content negotiation, one big file for your terms or many small ones, HTTP 303 redirects, and choosing between hash and slash.
Leo Sauermann, Richard Cyganiak and Max Völkel, have written a note, Cool URIs for the Semantic Web, that discusses some of these issues.
“The Resource Description Framework RDF allows you to describe web documents and resources from the real world—people, organizations, things—in a computer-processable way. Publishing such descriptions on the web creates the semantic web. URIs are very important as the link between RDF and the web. This article presents guidelines for their effective use. We discuss two strategies, called 303 URIs and hash URIs. We give pointers to several web sites that use these solutions, and briefly discuss why several other proposals have problems.”
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 5th, 2007, by Tim Finin, posted in Uncategorized
Today’s New York Times has an article, A Richer Trip to the Mall, Guided by Text Messages, on new services that connect shoppers with stores while whey are at malls.
“Technology companies like NearbyNow of Los Altos, Calif., and GPShopper in New York have introduced mobile Internet applications that allow shoppers to use their cellphones and PDAs to search the inventory and prices at the local mall, save them wasted steps and, sometimes, turn up last-minute bargains and promotions.”
The field seems to have quite a few companies; in addition to NearbyNow and GPShopper, the article mentions Krillion (”find major appliances near you”) and BrandHabit (”Get your fashion fix…locally”). An outstanding issue that needs solving is how to maintain awareness of the available inventory at each location.
“One potential sticking point for services like these … is that retailers typically do not have the sophisticated technology to track their stores’ inventories and transfer that information to their own Web sites, let alone those of other companies. Web sites can ill-afford to tell shoppers that a product is available, only to have customers find out at the mall that it is not.”
Edit | Bookmark@del.icio.us | Trackback | Comments Off
|  |
|  |