UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
16 May 2008, 01:05:37 EDT  
2006 June

Archive for June, 2006

Swoogle hits 1.5M Semantic Web documents

June 5th, 2006, by Tim Finin, posted in Uncategorized

Sometime last night the Swoogle crawler found its 1,500,000th unique Semantic Web document. These 1.5M documents comprise about 1M RDF documents, 350K documents with embedded RDF data and 150K documents that look like Semantic Web documents but are currently inaccessible or fail to parse properly. About 3000 new documents are discovered each day. We estimate that of the 1M RDF documents, about 1% (10K) are ontologies, as opposed to data, examples or test files. Swoogle is not processing RDFa content, microformat data, or PDF and JPEG documents. The crawler is also severely governing its crawl of some domains (e.g., livejournal.com) that have large numbers of FOAF and RSS documents to maintain a more balanced and interesting collection.

Policy 2006 Workshop starts today

June 5th, 2006, by Pranam Kolari, posted in Uncategorized

The 7th International IEEE Workshop on Policies for Distributed Systems and Networks (Policy 2006) started today at the University of Western Ontario, London, ON, Canada.

Held every year since 1999, Policy 2006, is the primary forum for technical exchange on the research and standards related to policies for networks and distributed systems. Policy 2006 will present up-to-date approaches for policy specifications, integration with management systems and new applications of policies.

We will be presenting one of our papers on Policy Management of Enterprise Systems on Wednesday. Lalana Kagal, an UMBC alumni has a presentation based on her work on policy delegation networks [pdf].

Computer science thesis splog

June 4th, 2006, by Tim Finin, posted in Uncategorized

Sploggers are everywhere and they are getting desperate. Today I ran across the Computer Science thesis splog. It mines posts from Google Blog Search via a query on computer+science+thesis. Talk about a niche market!

Studying this a bit shows how easy it is. Google blog search lets you get an RSS feed for a given query. The feed’s description has an excerpt from the original posts showing the matching query words in bold and with images and links stripped out. It’s an easy task to extract the first N items from the feed to make a splog post.

Ebiquity blog visualized

June 3rd, 2006, by Tim Finin, posted in Uncategorized

The ebiquity blog as visualized by Websites as Graphs on 3 June 2006.

ebb visualized

Early AI hacker Alan Kotok passes away

June 3rd, 2006, by Tim Finin, posted in Uncategorized

I was saddened to read John Markoff’s NYT article on the death of Alan Kotok. He was in the group of TMRC hackers who helped found MIT AI Lab. He was known for his work on the Spacewar game and the first MIT chess program with John McCarthy that becames his MIT BS thesis. He joined DEC after leaving MIT where he designed the PDP-6 and was chief architect of the PDP-10. A generation (or two) of Computer Science grad students learned how to hack on those machines. (I had the pleasure of wasting many hours playing Spacewar on PDP-6 serial number 2.) Kotok joined the W3C as associate chairman in 1997.

Ask.com Bloglines releases new feed search service

June 1st, 2006, by Tim Finin, posted in Uncategorized

Bloglines, the web-based feed reader owned by Ask.com, has revealed a new feed search system that is much better than their old one. Several things set it apart from other blog/feed search services.

First, it is relatively splog-free. Bloglines’ feed collection is based on user subscriptions, so any of the ~1M feeds in Bloglines collection are subscribed to by at least one registered Bloglines user. There are splogs in the collection of course (e.g., this and this), since nothing prevents a splogger (or his splogbot) from subscribing to each new splog. But, Bloglines knows how many subscribers each feed has and you can filter the results by requiring that each one have one, two or many subscribers.

Second, Bloglines has a high quality way of judging a feed’s popularity. In general, ranking blogs is harder to do than ranking Web pages since the blogosphere is very dynamic and links to blogs are mediated for software infrastructure (e.g., Bloglines) and therefore completely or partially hidden. Moreover, there is much more of a temporal focus to search — users want to find yesterday’s posts rather than very popular posts from last year. (see tailrank for a good attempt to find hot recent posts). Bloglines has a stable based of over 100K users — big enough to use their collective subscriptions as the basis for feed popularity ranking. The only way undermine this is for sploggers and spammers to flood Bloglines with fake user registrations.

Bloglines indexes feeds of all kinds, not just those from blogs. This has advantages and disadvantages. On the plus side, sometimes you don’t care where the information comes from: news site, aggregators, community portals, data streams, or corporate blogs, personal diary blogs, etc. Also, there isn’t a clear line between the categories — is Slashdot a blog? The downside, of course, is that sometimes you do. If I want to find out what’s happening in Baghdad today, I probably want to search over news feeds and not MySpace diaries. Bloglines’s search does allow you to select between all feeds, just news feeds or all but news feeds. This is a good start.

Bloglines new search has some great advanced search features, too. You can restrict the search to posts and blogs to any combination of 20 different languages and also limit the search to a given time period.

We think Bloglines is a great service and their enhanced search system makes it even better.

See also Tech Crunch’s article and this one.

You are currently browsing the UMBC ebiquity weblog archives for June, 2006.

  Home | Archive | Login | Feed

Recent posts

  • Students: brand yourself with a blog
  • Social Data on the Web workshop at ISWC 2008
  • Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBC
  • Gossip-Based Outlier Detection for Mobile Ad Hoc Networks
  • Int. Conf. Semantic Web deadlines this week and next (ISWC 2008)

  • Ebiquity community

  • Fieldmarking data blog
  • Geospatial Semantic Web
  • Harry Chen thinks aloud
  • Planet social media research
  • Social media research blog
  • TrackForward by Kolari
  • UMBC GAIM

  • UMBC