September 7th, 2006, by Tim Finin, posted in Uncategorized
Philips has unvieled some very cool light emitting fabrics. Check out the video to see them in action. So far most of the talk has surrounded fun uses of this technology, like marketing or clubing. I am trying to find an interesting, and serious use for this technology. I feel like this could have an interesting role to play in medicine.
September 6th, 2006, by Tim Finin, posted in Uncategorized
What does everyone do when there’s a new search service? They do an “ego search”. I was surprised at how many hits the new Google News Archive service found for my surname, which is relatively uncommon. Here’s the snippet from an article from the August 10, 1843 edition of The Ohio Repository:
I didn’t even know I had ancestors in Ohio! If I weren’t so cheap I’d buy access to the article to see what they were up to. I hope it didn’t involve an arrest.
It seems like many of the documents in news archives were scanned poorly and are unusable. I tried entering a query consisting of a random, meaningless four-letter sequence, GFHD, and got eleven results, like this one:
This isn’t Google’s fault, of course. But, I wonder what fraction of the scanned in content from newspapers and magazines is unusable because of poor quality scanning? I suspect that smaller collections are more likely to suffer from this problem. Will their content owners be motivated to go back and rescan their collections? Maybe easy search services like Google’s will make it worth their while to do so.
September 6th, 2006, by Tim Finin, posted in Uncategorized
Google launched its archive service today, which had been tipped by registering domains like google-archives.com. It wasn’t what many had thought — an archive of old versions of Web pages. Here’s the description from the About News Archive page:
“News archive search provides an easy way to search and explore historical archives. Users can search for events, people, ideas and see how they have been described over time. In addition to searching for the most relevant articles for their query, users can get an historical overview of the results by browsing an automatically created timeline. Search results include both content that is accessible to all users and content that requires a fee. Articles related to a single story within a given time period are grouped together to allow users to see a broad perspective on the events.”
The index includes the entire contents of the archives of major US news sources, including the NYT and Washington Post and Time magazine. Many of the articles found will only be available to source subscribers or for a fee.
“Some of HighBeam’s 3,300 publications and 40 million documents will be available free, while in other cases users will see just the headline and the first 600 characters of a document. To see the whole thing, users must be subscribers to the firm’s service, which costs either $20 a month or a $100 annual fee. … With some publications, including The New York Times and The Washington Post, searchers will be sent to Web sites where they will be able to buy individual articles.”
An interesting feature of Google’s new service is that it can lay out search results on a time line.
“The new service is not encyclopedic, Mr. Acharya said, but instead presents users with a representative list of relevant articles that are arranged in a timeline fashion. The service tries to offer a pointer to the time period that is most relevant to the search query. For example, in the case of the search phrase “moon landing,†an arrow points the user to 1969.“
This looks like a great resource for scholars of all kinds, from professional to amateur.
September 4th, 2006, by Tim Finin, posted in Uncategorized
The New York Times has an article (New Web Sites Seeking Profit in Wiki Model) on attempts to figure out how to monitize (did I just say that?) a Wiki. WikiHow is one example mentioned in the article. Wiki.com is another. I like the idea of a hosted wiki service aimed at supporting small wikis. Dave Becket Dan Brickley recently suggested using a WIki to let people suggest questions they would like to see answered from the Swoogle team. Maybe we can use http://swoogle.wiki.com/ for this.
September 3rd, 2006, by Tim Finin, posted in Uncategorized
Garett Rogers noticed that Google just registered a number of domains like google-archivesearch.com suggesting a plan to offer a service to access old versions of Web pages (like the Internet Archive or perhaps a subset of the Web, like old news articles. Here are the newly registered domains Garett noticed include archivesearchgoogle.com/net, archivesssearchgoogle.com/net, google-archive-search.com/net, google-archive.com/net, google-archives-search.com/net, google-archives.com/net, google-archivesearch.com/net, and google-archivessearch.com/net.
The idea of a Web archive is a great one, and Brewster Kale richly deserves credit for creating the first. If Google does provide a similar service, maybe they will spur new thinking about how to better support searching over archives and what new services can be built given one. For example, knowing how often a Web page changes can have an impact on how much you trust it’s information. Wikipedia’s archiving of old versions of articles, for example, has been used for this (see Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study).
I think there’s a case for having multiple archives of Web pages, focusing on certain kinds of Web pages. At the beginning of 2005 we started a Semantic Web archive by retaining copies of old versions of RDF documents when we discover and process a new version. We hope that it will provide a good resource that researchers can use to explore topics like how ontologies evolve.
“This week’s question about real-world use of RDF metadata: is anybody using RDF Schema for the sake of RDF Schema, or has RDFS become little more than a layer of RDF/OWL? For example, we use rdfs:domain and rdfs:range in our RDF/OWL ontologies, but have owl:Class and owl:subClassOf completely replaced the use of rdfs:Class and rdfs:SubClassOf? In other words, has RDF/OWL, as an extension of RDFS, replaced the use of RDFS by itself, or is anyone still creating and using RDF Schemas that use nothing from the owl namespace?”
We wondered about that too and looked into it for a paper, Characterizing the Semantic Web on the Web, to appear in ISWC 2006. Here’s some text from the paper. The statistics mentioned were from data from earlier this summer, so are a bit out of date. I spot checked them and found that they are not too different today.
To what degree does the current Semantic Web make use of RDFS and OWL? One simple way of addressing this question is to examine the number of Semantic Web documents (SWDs) available on the Web that use the RDFS and OWL namespaces. The OWL namespace has been declared by 113K SWDs (8%) and actually used by 108K (7%). The RDFS namespace enjoys more use, being declared by 677K (47%) and used by 538K (37%) SWDs.
What about their terms? Not surprisingly, owl:Class is the most used term from the OWL namespace with ~ 1.8M instantiations in 68K SWDs. Contrasting this with rdfs:Class, which has 327K instantiations by 8.6K SWDs, seems to suggest that OWL is being more heavily used than RDFS. However, the relationship is not so simple. When examining properties, rdf:Property has 529K immediate instantiations from 59K SWDs, considerably more than the OWL property terms owl:ObjectProperty (170K assertions in 8K SWDs) and owl:DatatypeProperty (48K assertions in 4.6K SWDs).
For RDFS and OWL properties, the most used properties is rdf:type, followed by some annotation properties such as rdfs:seeAlso and rdfs:label. Among those properties that are used as ontology constructs, owl:sameAs and rdfs:subClassOf are the most used. We also noticed significant use of two OWL equality assertions: owl:sameAs (280K assertions in 17K SWDs) and owl:equivalentClass (70K assertions in 4.3K SWDs). Their common use may be an indication of increased ontology alignment. We have found limited use of properties that require OWL DL or OWL FULL reasoning support. The most common one in our dataset was owl:unionOf which is used in only 2.5K SWDs.