The ebiquity Matrix servers (neo, trinity, morpheus, logos, niobe, link) were successfully moved from the cybersecurity lab to one of the OIT machine room. It turned out to be quite a bit of work to get the computers into our rack. We had custom rack mounting kits, but they required that many of the components in each computer be moved within its chassis. We still need to get some parts to get Logos in, since it’s a slightly different model. Many thanks to everyone who helped: Geoff, Brendan, Anand, Akshay, Pranam, Li, Andrej, Nimish, Sheetal and Lushan. We’ve uploaded some pictures from the tail end of the move to the ebiquity flickr site. Add comments and notes if you have a flickr account.
Archive for January, 2006
Russia’s state security service has accused British diplomats of spying in Moscow using electronic rocks. It’s an obvious hack, when you think about it — a bluetooth enabled PDA in a hollowed out rock could be used to drop off or pickup heavily encrypted documents from spys as they stroll by. The only problem would be power. Such a bluetooth rock would be much better than Alger Hiss’s pumpkin patch.
In an infamous spy case from the early days of the cold war, US State Department official Alger Hiss was accused (by a young Richard Nixon!) of passing documents via rolls of microfilm secreted in a hollowed-out pumpkin on his Maryland farm. But, technology marches on, with wireless rocks replacing pumpkins.
The March of Progress
In 1948 Alger Hiss was accused of transferring secrets using microfilm in a hollowed out pumpkin.
In 2006 the British were accused of transferring secrets using a wireless enabled PDA in a hollowed out rock.
models: Jack-o’-lantern, squash
vulnerable to: rodents, fungus, kids
pluses: organic, biodegradable
negatives: decay, rot
models: igneous, sedimentary
vulnerable to: bluejacking, spyware
pluses: tetris, plays mp3s
A group of UMBC students working with Professor Zary Segall have built a prototype music player that senses its user’s emotional state and level of activity and picks appropriate music. The prototype system uses BodyMedia’s SenseWear, which detects continuous data from the wearer’s skin and wirelessly transmits the data stream to the xpod prototype. The physiological data includes energy expenditure (calories burned), duration of physical activity, number of steps taken, and sleep/wake states. A neural network system is used to learn associations between these biometric parameters and the user’s preferences for music and the resulting model is then used to dynamically construct the xpod’s playlist. Read more about the xpod prototype in this recent paper:
XPod a human activity and emotion aware mobile music player, Sandor Dornbush, Kevin Fisher, Kyle McKay, Alex Prikhodko and Zary Segall.
A good read at http://stopbadware.org, it seems to be a MEGA campaign by Google, Levono and Sun Microsystems.
“Several academic institutions and major tech companies have teamed up to thwart ‘badware’, a phrase they have coined that encompasses spyware and adware. The new website, StopBadware.org, is promoted as a “Neighborhood Watch” campaign and seeks to provide reliable, objective information about downloadable applications in order to help consumers to make better choices about what they download on to their computers. We want to work with both experts and the broader internet community (.orgs and .edus) to define and understand the problem.”
Harry Chen blogs about Web 2.0 Validator, an automated web tool that determines how 2.0ish your Web site is based on a set of Web 2.0 characteristics. While Harry reports that his site only scored 11, it now scores 31! No, I don’t think he’s just been studying for the test so he could retake it. It appears due to Harry’s post on Web 2.0 Validator — just talking about Web 2.0 Validator makes your site seem to be a Web 2.0 site to Web 2.0 Validator. Or maybe this is related to Russell’s paradox, somehow.
Anyway, this post should help raise our own Web 2.0 factor a bit, even though the site is not in public beta, uses PHP and not Python, and we don’t really mention mash-ups, startups, Less is More, Dave Legg, the Web 2.0 Validator’s ruleset, Flickr, VC, VCs, Nitro, Firefox, Ruby, links to slashdot, or uses the tag.
The six ebiquity Matrix servers (neo, morpheus, logos, etc.) are currently offline as they are being moved from their temporary home in the ITE cybersecurity lab to our rack in the main machine room in the ECS building next door. The ECS machine room is a better home for them as it’s a more controlled environment with robust power loss protection. We expect to have them back online sometime late Friday afternoon.
Recently ClÃ¡udio Fernandes asked on several semantic web mailing lists
“Can someone point me to some huge owl/rdf files? I’m writing a owl parser with different tools, and I’d like to benchmark them all with some really really big files.”
I just ran some queries over Swoogle’s collection of 850K RDF documents collected from the web. Here are the 100 largest RDF documents and OWL documents, respectively. Document size was measured in terms of the number of triples. For this query, a document was considered to be an OWL document if it used a namespace that contained the string OWL.
Curently, the version of Swoogle you get by going to http://swoogle.umbc.edu/ is Swoogle 2. Its database has been trapped in amber since last summer, when it was corrupted, preventing us from adding new data. We put our efforts into a reimplementation, Swoogle 3, which will be released early next week. The data reported here is from Swoogle 3’s database.
Google has published a study of Web Authoring Statistics in which they analyzed the HTML use of over one billion web pages.
“In December 2005 we did an analysis of a sample of slightly over a billion documents, extracting information about popular class names, elements, attributes, and related metadata. ”
The results have lots of interesting data on what attributes are commonly used (and misused) with what classes, and for some, what popular values are. No sign of embedded RDF
or even of microformats. Maybe next year.
We’ve used Swoogle to do a similar analysis for RDF documents in general, and for FOAF documents in particular. An interesting study would be to analyze what features of RDF, RDFS and OWL are used for Swoogle’s collection (about 850K documents with RDF content as of the beginning of this year. I don’t think we can do this from our database, but would have to go back and process the cached documents, probably with a special purpose, light weight parser/analyzer (which is what Google did in their HTML study).
Google Scholar, it’s a good thing, as Martha Stewart would say.
As I’ve worked through our papers to verify and add their Google Scholar keys, other benefits are becoming apparent. In several cases I’ve discovered errors or omissions in our own meta data. Sometimes our own entries have had the title wrong! In other cases, I’ve found several Google Scholar entries for the same paper. Sometimes this is due to an error by the author of a citing paper, which can propagate.
I suspect that some of the errors originate with us. Here’s one scenario. When a paper is accepted for publication, the author is happy and excited and adds an entry in our database, along with softcopy of the draft. People download and read the draft and, if it’s good, start citing it. Months later the ultimate copy, which may have a different title and even a different author list, is finalized. Ideally, our site is edited to reflect the final metadata and final softcopy. But, sometimes this doesn’t happen or the final softcopy is not uploaded for copyright reasons. In any case, the old, and possibly incorrect metadata and draft may have escaped to roam the Internet.
Lately I’ve started to add a header to draft copies of papers posted to our side that states that they are drafts and also where the final version will appear. I’ve found Acrobat’s ability to add a header to an existing pdf file to be very handy for this. I’ve also used Acrobat to extract the first page of an article for which we don’t hold the copyright, add a header pointing to it’s source, and post that on our site (as in this example.)
Finally, one of the ideas that underlies the current Semantic Web vision is that it’s very useful for things on the web to have good identifiers. The Uniform Resource Identifier (URI) is the Semantic Web’s favorite identifier, but we all recognize that just using URIs is to simple for many objects (e.g., people). OWL’s contribution to this is the notion of an inverse functional property. If my ontology defines SSN as an inverse functional property, then two objects that share the same SSN must be the same. So, along these lines, the googleScholarKey property should be inverse-functional and have domain=publication and range=string.
Two years ago Bill Gates predicted that the spam problem would be solved by now, as this article in The Register reports.
Hey Bill, why am I still getting spam?
Junk mail outlives MS mortality prediction
By John Leyden, 24 January 2006
Two years ago today Bill Gates predicted that spam email would be eradicated as a problem within 24 months. The Microsoft chairman predicted the death of spam in a speech at the World Economic Forum on 24 February 2004.
Gates outlined a three-stage plan to eradicate spam within two years. Microsoft’s scheme calls for better filters to weed out spam messages and sender authentication via a form of challenge-response system. Secondly, Microsoft wants to see to a form of tar-pitting so that emails coming from unknown senders are slowed down to a point where bulk mail runs become impractical.
Lastly, and most promisingly as far as Gates is concerned, is a digital equivalent of stamps for email, to be paid out only if the recipient considers an email to be spam. Blocking spam email would appear to be a simple problem but in practice is far trickier than Gates, or indeed the industry, first thought.
It’s tempting to think that we are close to being able to solve the splog identification problem, which enable blog search engines to weed the slogs out of their indices. But, I’ll bet that splogs will be with us for a long time, as is the case with spam. Of course, we do have to work hard to keep them under control, just as we do with spam. If we don’t, the blogosphere will be quickly overrun and its promise squandered.
We noticed a Jose Vidal using a great idea on his publication list which we’ve added to the ebiquity site’s publication page. Jose augments his paper descriptions with data from Google Scholar (GS) — a link to the GS data, the number of citing papers, and a list of their GS data.
We think GS is likely to be increasingly important in the academic/scholarly community. It’s a way to find papers, of course, but also helps judge their significance to the field as measured by the number of citations. Citation counting is the traditional way of measuring the impact of a paper. Using Google Scholar’s citations to measure impact has its problems, a topic we’ve posted on before and is also discussed in the bibliometric circles, but it’s free and convenient, a combination that’s hard to beat. (Writing this, I wonder if anyone has tried a recursive model like that used in pagerank to citation graphs. If not, this would be an interesting experiment to do).
Here’s how our paper listings now works. We augmented the RGB paper ontology to give the paper class a new metadata property, googleKey, that is then used to derive the other properties — the number of citations and links to the GS description and the list of citing papers. Right now getting the GS Key is done manually since automating it reliably is not trivial. But we do have a link on the paper display that makes it easier to find the key by querying GS with the paper title and showing the results. If the paper is in GS, it will probably be on the first page.
Every night, an agent (well, ok, a cron job) checks Google Scholar to update the citation counts for all of the papers that have a GS key.
Our lab members tend to enter papers into the site’s database as soon as they are accepted for publication, which is long before they show up in Google Scholar and even longer before they begin to accrue citations. So authors will have to periodically check recently entered papers and update them with their GS keys when available. It will take some weeks or more before we’ve processed all of the old papers to look up their GS Key. Once we’ve done so, I think it should be easy to maintain it.
ACM/SIGART Autonomous Agents Research Award
The selection committee for the ACM/SIGART Autonomous Agents Research Award is pleased to announce that Dr. Michael Wooldridge of the University of Liverpool, UK is the recipient of the 2006 award.
Dr. Wooldridge has made significant and sustained contributions to the research on autonomous agents and multi agent systems. In particular, Dr. Wooldridge has made seminal contributions to the logical foundations of multi-agent systems, especially to formal theories of cooperation, teamwork and communication., computational complexity in multi-agent systems, and agent-oriented software engineering.
In addition to his substantial research contributions, Dr. Wooldridge has served the autonomous agents research community, in a variety of ways including founding of the AgentLink Network of Excellence in 1997 and most recently as the Technical Program co-chair of the Fourth International Conference on Autonomous Agents and Multi Agent Systems (AAMAS2005).