 | 2006 August 
Archive for August, 2006
August 22nd, 2006, by Tim Finin, posted in Uncategorized
Congratulations to Amit Sheth on his appointment as the LexisNexis Eminent Scholar for Advanced Data Management and Analysis at Wright State University. Amit is a good friend and colleague with whom we’ve collaborated on many projects for more than twenty years. He is well known for his wide-ranging work in advanced information systems which includes contributions to distributed databases, workflow management, digital libraries and the semantic web. He’s held positions at Honeywell, Unisys, Bellcore and the University of Georgia, where he founded the Large Scale Distributed Information Systems Laboratory.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 21st, 2006, by Tim Finin, posted in Uncategorized
AOL’s chief technology officer, Maureen Govern, is reported to have resigned in the wake of privacy concerns stemming from the release of Web search data. It is also being reported that the researcher who released the data and that employee’s direct supervisor were also forced out.
The AOL research Web site is currently offline (here’s is the Google cache version).
It’s a sad and unfortunate outcome of AOL’s release of Web search data that was intended to help the research community build better software systems.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 21st, 2006, by Tim Finin, posted in Uncategorized
Several recent posts have mentioned our work on splogs and given out the general URL of our blog. We appreciate the references to our research. If you are interested in finding out more about it, you might visit our
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 21st, 2006, by Akshay Java, posted in Uncategorized
There seems to be something wrong: Is it some downtime? Very strange!

Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 20th, 2006, by Tim Finin, posted in Uncategorized
We had hoped to write a paper for IJCAI that gave some data and observations on how ontologies are being used on the Semantic Web based on our experience with Swoogle. We ran out of time so have not really completed the paper, although we did get some of it written. Here, then, are our notes…
Untangling ontologies on the Semantic Web, Tim Finin and Li Ding, July 2006.
Ontologies are an essential part of the Semantic Web framework and vision. Intuitively, ontologies define terms, e.g., classes and properties, that are subsequently used to publish data and form queries. Like most aspects of the Web, the ways that people define, publish, modify, use and reuse ontologies have evolved from the simple models first envisioned. We have analyzed a collection of over 1.7 million RDF documents harvested from the Web by the Swoogle Semantic Web search engine of which about 1% are ontologies. This analysis has helped us to understand how ontologies are actually being used on the Semantic Web and suggest some new approaches to managing them as it grows. …MORE…
Comments are welcome, of course.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 18th, 2006, by Tim Finin, posted in Uncategorized
The Washington Post continues to be in the forefront of MSM publishers who are exploring how to incorporate blogs and other consumer generated content. Their latest effort is Sponsored Blogroll, a program in which selected bloggers are invited to partner with the Post. Here’s how they describe it.
“A link to members’ blogs will be featured in our Sponsored Blogroll index, giving your writing promotional space on the washingtonpost.com home page and giving you an introduction to an audience of 8 million readers monthly. At the same time, our hardworking sales reps will help connect your signature musings with the huge number of advertisers we deal with every day who are looking for the next big, slightly-outside-the-mainstream idea.”
The project was designed by Jeff Burkett at WashingtonPost.Newsweek Interactive (WPNI) who announced the launch and discussed its motivation in his blog. Steve Rubel also blogs about the new program. What’s not clear, and apparently still undecided, is how the matching of ads to blogs will be done and how the finances will work for either advertisers or bloggers.
Having your blog listed on the Washington Post web site is an exciting idea, but today’s Sponsored Blogroll list looks rather weak.
 Based on the blog titles, I’d have guessed they were splogs! That said, I think this is a great experiment and am interested to see how it evolves. Full disclosure: we just submitted a request to have our blog considered for inclusion in the program.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 17th, 2006, by Tim Finin, posted in Uncategorized
The Swoogle semantic web search system was down for part of today because we moved one its servers into the ECS machine room. Alas, it will be down tomorrow from 21:00 Friday (GMT-5) to 12:00 Saturday while that machine room gets a major power upgrade.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 17th, 2006, by Tim Finin, posted in Uncategorized
We are all familiar with wireless technology, mostly through the ubiquitous 802.11 networks found in our offices, homes and cafes. But this technology does not really deliver a wireless network, but rather provides “wireless access” to the conventional wired network whose design is decades old.
A recent artile in Network World discusses research aimed at combining concepts from MANETs, AI and cognitive radio to prototype the next generation of wireless networks.
Military research aims to develop self-configuring, secure wireless nets Researchers develop military-grade intelligent wireless net. By Ryan DeBeasi, NetworkWorld.com, 08/16/06
Government, corporate and academic researchers are working on a network that would be able to configure itself, intelligently cache and route data, and allow for fast and reliable sharing of data, all while maintaining military-grade security.
The project is called Knowledge Based Networking and is under development by the Department of Defense Research Projects Agency (DARPA). … Academic concepts such as artificial intelligence and Tim Berners-Lee’s “Semantic Web,” combined with technologies such as the Mobile Ad-hoc Network (MANET), cognitive radio, and peer-to-peer networking, would provide the nuts and bolts of such a network. Although the project is intended for soldiers in the field, the resulting advances could trickle down to end users. “Military networks are going to converge as closely as we can to civil technologies,” says Preston Marshall, the program manager of DARPA’s Advanced Technology Office.
Some of this work is supported by the DARPA XG program and was featured at a workshop on Real-Time Knowledge Processing for Wireless Network Communications held in March 2006. Research on this and related topic will also be a focus on the new NSF Global Environment for Networking Innovations (GENI) initiative.
The next generation of wireless and wired networks will have requirements for many AI and AI-related concepts, algorithms, techniques and technologies. In addition to using Semantic Web languages, research groups are currently exploring the use explicit declarative policies, self awareness and monitoring, learning and adaptation, and reasoning.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 14th, 2006, by Tim Finin, posted in Uncategorized
Move over Turing Test. Step back Loebner Prize. The latest proposal for a simple test for machine intelligence is the Hutter Prize for Lossless Compression of Human Knowledge.
“Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 100MB of Wikipedia better than your predecessors, you(r compressor) likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs.”
If you do manage to best the current record for compressing the first 100MB of Wikipedia, you won’t necessarily get the full prize. What you will win, besides 15 minutes of fame, will be based on how much better your program has done than the current best program. The payout is similar to that used for the Methuselah mouse prize awarded to researchers who extend the lifespan of a mouse to unprecedented lengths. For the Hutter prize, you need to create a self-extracting archive version of the 100MB file enwik8 of less than 18MB. In particular:
- Create a Linux or Windows executable archive8.exe of size S, which is less that L, the previous record (currently 18,324,887).
- When your archive8.exe is executed, it produces a 108MB byte file identical to enwik8.
- Upon verification, you are eligible for a prize of min(50,000*(1-S/L),500) euros.
The initial 50K € purse is underwritten Marcus Hutter of the Swiss Dalle Molle Institute for Artificial Intelligence.
The use of tests and challenges is always somewhat controversial in AI. Everyone sort of agrees that the Turing test is interesting and says something about what it means to be intelligent. But partly this is for historical reasons and out of respect to Alan Turing, on whose shoulders we all try to stand. The Loebner prize, however, is not taken seriously because it’s built in simplifications and limitations encourage winning though clever hacks that are not likely to generalize. I’m afraid that the Hutter Prize is even father out and is unlikely to be helpful in either advancing our understanding of intelligence or in developing new techniques that are useful in building more intelligent computer programs or systems.
Competitions can be very motivating for researchers and we’ve participated in a number of them. I think a better model for AI competitions are the DARPA Grand Challenge, Text Retrieval Conference challenge tasks, RoboCup and the Trading agent competition. These are all focused on simplified tasks that are very close to real world problems that people want solved. Some of these competitions have resulted in new ideas and algorithms that have already been immediately applied to useful applications.
One reason what I think that the Hutter Prize is not a good AI problem is the requirement that the expanded file be identical to the original. Lossless compression is just not a good model for human memory, I’m afraid. Part of our knowledge and our intelligence is knowing what’s worth remembering and what is not. Another aspect is relating new knowledge and information to what we already know, which might result in a lossy, but more useful and compact, encoding of the information. The Hutter Prize may be useful in encouraging the development of better text compression algorithms, of course, as text compression is useful and probably important, even in a world where storage costs continue to decrease dramatically. I think it would also be stimulating and fun, especially for students.
See the Google discussion group for more information and discussion on this interesting challenge.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 12th, 2006, by Tim Finin, posted in Uncategorized
Sören Auer started a tread on the semantic-web@w3.org mailing list asking Is there real world RDF-S/OWL instance data? :
“As an argument to stress the importance of the Semantic Web and as example data to evaluate tools it would be nice to have a library of real world RDF-S/OWL instance data available. My impression is, that there are many schemas around, but it’s harder to find real life instance data. This might be due to copyright issues or the fact, that the borderline between classes/schema and instances/data is not always clearly marked and some projects rather use a representation as classes than a representation as instance data.”
There are some tricky questions underlying this, like what counts as “real world” and what counts as “data”, but from our perspective, virtually all of the RDF content out there is at the instance level, not the schema level, at least formally.
Swoogle has a collection of over 1M error-free RDF documents collected from the Web and an additional ~700K documents that have embedded RDF, are malformed but appear to be RDF, or are no longer accessible. We’ve intentionally limited the number of simple RSS and FOAF documents in the current collection.
Only about 5% of these documents contain *any* triples that contribute to a definition. The rest consist of all data. We’ve determined that most of the 5% that contain definitional triples do so incorrectly and should be all data. Of the remaining ones, many are duplicates and copies. We estimate that only about 1% of Swoogle’s collection are proper ‘ontologies’ that are intended to (partially) define at least one named term.
For the ~1.7M Semantic Web Documents (SWDs), the following table shows the number and percentage of SWDs by the percent of their triples that are at the schema level.
|
%def
|
# SWDs
|
%all SWDs
|
|
0%
|
1,676,874
|
94.70
|
|
0-10%
|
1,679,153
|
94.83
|
|
10-20%
|
2,512
|
0.14
|
|
20-30%
|
3,209
|
0.18
|
|
30-40%
|
35,526
|
2.01
|
|
40-50%
|
16,384
|
0.93
|
|
50-60%
|
1,817
|
0.10
|
|
60-70%
|
5,556
|
0.31
|
|
70-80%
|
4,063
|
0.23
|
|
80-90%
|
1,599
|
0.09
|
|
90-100%
|
5,108
|
0.29
|
|
100%
|
15,756
|
0.89
|
That said, the vast majority of defined classes have no immediate instances and the majority of properties have never been used to assert a value. This table shows for both classes and properties, the number that have been defined either explicitly or implicitly through reference, the number that have been populated, and the percent that have been populated.
|
type
|
def/ref
|
pop
|
%pop
|
| classes |
1,386,272
|
34,018
|
2.45%
|
| properties |
156,131
|
42,839
|
27.44%
|
Based on this data, many classes have been introduced but not immediate instantiated. Properties are much more likely to be used to assert values of an instance once introduced. These statistics are probably influenced by a few large SWDs that define many classes that are intended to be used as data. WorldNet is a good example. The usage patterns for RDF terms is not so surprising when you compare it to word use frequency in natural language (e.g., see Zipf’s law) whicn follows a power law curve.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 12th, 2006, by Tim Finin, posted in Uncategorized
SEO Blackhat is reporting an interesting analysis enabled by the AOL search data that shows the click through rate (CTR) a search result by position.
| Rank |
CTR |
1 |
.421 |
| 2 |
.119 |
| 3 |
.085 |
| 4 |
.061 |
| 5 |
.049 |
| 6 |
.041 |
| 7 |
.034 |
| 8 |
.030 |
| 9 |
.028 |
| 10 |
.030 |
| >10 |
.113 |
Edit | Bookmark@del.icio.us | Trackback | No Comments »
August 11th, 2006, by Tim Finin, posted in Uncategorized
IEEE Computer Society Standard Committee FIPA will will meet on Wednesday 13 September 2006 in conjunction with the Tenth International Workshop on Cooperative Information Agents. The meeting will be held at the University of Edinburgh, e-Science Institute, 15 South College Street, Edinburgh EH8 9AA, UK Level 2, Room “Cramond”. For more information, contact Stefan Poslad or James Odell. The program is as follows:
| 09.30 |
Overview of FIPA |
| 10:00 |
Presentations by Individual Work Groups (WGs) |
| 11:45 |
Coffee break |
| 12:10 |
Discussion I within individual WGs / WG Coordination |
| 13:00 |
Lunch |
| 14:00 |
Discussions II within individual WGs |
| 15:30 |
Coffee break |
| 16:00 |
Summary of status and coordination of plans of individual WGs |
| 17:00 |
Adjourn |
FIPA, the Foundation for Intelligent Physical Agents, is an IEEE Computer Society standards organization that promotes agent-based technology and the interoperability of its standards with other technologies. FIPA maintains a mature and implemented set of standards for communication languages, protocols and infrastructure for multiagent systems.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
|  | Recent postsStudents: brand yourself with a blogSocial Data on the Web workshop at ISWC 2008Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBCGossip-Based Outlier Detection for Mobile Ad Hoc NetworksInt. Conf. Semantic Web deadlines this week and next (ISWC 2008)
Ebiquity communityFieldmarking data blog
Geospatial Semantic Web
Harry Chen thinks aloud
Planet social media research
Social media research blog
TrackForward by Kolari
UMBC GAIM
|  |