RDF based mind mapper

December 30th, 2005

MindRaider is a “mind mapping” application that uses RDF as it’s native representation. It’s an open source software project developed in Java by Czech programmer Martin Dvorak . The concept of a mind map is a simple one, but I’ve found it to be very useful over the years for organizing my thoughts. There is a natural fit with RDF and, in fact, any graph based representation. I’m not sure yet how MindRaider is taking full advantage of RDF, but the possibilities are intriguing. (spotted on ltu).

Smart Car Knows How to Park Itself and More

December 25th, 2005

German engineers are working on a new smart car that knows how to find empty parking spaces and park itself.

Parkmate, which is expected to be available from 2008, is part of a battery of technology being developed by Siemens VDO, one of the world’s major suppliers of in-car electronics.

Talk Digger tracks blog conversations

December 24th, 2005

Talk Digger “helps users to find, follow and join conversations evolving on the Internet.” Put in a URL and it queries nine different search engines to find blog posts and othe pages that reference the page. It’s a nicely done, useful web application developed by by Frédérick Giasson. But here’s the most interesting part, at least from my point of view:

“What is the future of Talk Digger? It will evolve as a web service that will broadcast its results in RDF or even OWL. It would be the first step to do to enter Talk Digger into the Semantic Web age. Talk Digger will eventually have some sort of semantic analyzing capabilities and being able to use them on found conversations to clarify and optimize the returned results. It will then be able to semantically analyze conversations (I will not say more than that for the moment) and make the results of these computations available and understandable by other RDF/OWL reasonners and web applications. So, what is the future of Talk Digger? I hope it is the Semantic Web.”

Why no ones goes to your web site

December 16th, 2005

Why no ones goes to your web site

Three tech giants to finance research

December 15th, 2005

In today’s NYT, John Markoff writes:

With federal funds for basic computer science research at universities in decline, three of the industry’s leading companies are joining to help fill the void.

University of California computer scientists plan to announce on Thursday that the companies–Google, Microsoft and Sun Microsystems–will underwrite a $7.5 million laboratory on the Berkeley campus. The new research center, called the Reliable, Adaptive and Distributed Systems Laboratory, will focus on the design of more dependable computing systems.

The Berkeley researchers say that under the terms of their agreement with the three companies, the fruits of the research will be nonproprietary and freely licensed. Each company has agreed to support the project with $500,000 annually for five years.

While this is great for Berkeley, it may not be so good for the academic research computing community if (1) industry starts concentrating it’s research in the top 10 departments and (2) if government decides that its support for basic computing R&D is less neccessary because industry woll fund universities to do it.

Google to open R&D lab at CMU

December 15th, 2005

The Pittsburgh Post-Gazette writes:

“Internet goliath Google Inc. will open a research and development facility on Carnegie Mellon University’s campus, state economic development and university officials are expected to announce today.”

(via datamining)

Welcome to the Splogosphere: 75% of new pings are spings (splogs)

December 15th, 2005

In the blogosphere, pings are notifications sent by updated blogs to PingServers. A major issue recently has been unjustified pings, also known as Spings, sent by Splogs. Splogs have been discussed a lot recently, including an interesting thread on post piracy that Steve Rubel initiated on Micropersuasion.

The problem of splogs prompted us to analyze pings from weblogs.com, which publishes hourly pings as changes.xml. We have been collecting these pings over the last 4 weeks for a total of 40 million pings from around 14 million (so claimed) blogs. To begin with, we applied a language identification technique implemented by James Mayfield to identify language by fetching these blogs. As expected most of the pings were from blogs authored in English. But we were able to identify blogs from many other languages as well. For instance, charts below show a distribution of pings from blogs authored in Italian — over a day and over a week. Each bar denotes the number of pings per hour.

Pings over a day
Pings over 8 days

All times are in GMT; clearly Italian authored blogs display a specific blogging pattern.

In the next step we used our work on splog detection to detect splogs (and hence spings) among the english blogs. Our detection mechanism is close to 90% accurate. As shown in the charts below pings from blogs average around 8K per hour and those from splogs average around 25K.

Blog Pings
Splog Pings

Clearly almost 3 out of 4 pings are spings! Going back further to the source of these spings, we observed that more than 50% of claimed blogs pinging weblogs.com are splogs.

Based on the interestingness of this preliminary statistics, scope for further analysis and interest in the resulting dataset we decided to continuosly monitor the pingosphere. So, we now do it “live” on updated blogs published by weblogs.com(delayed by an hour), and have made it publicly available at http://memeta.umbc.edu. The site lists blogging patterns for many other languages, and compares splogs with blogs. All of our work is part of a larger project memeta, towards analyzing the content and structure of the blogosphere.

We hope our effort is a good complement to existing services (e.g., FightSplog, SplogReporter and SplogSpot) towards combating splogs. We currently publish only simple ping statistics on this site, but do stay tuned for fresh splog and classified blog dumps and much more!

UPDATE: Matthew Hurst from BlogPulse points us to an interesting analysis he has done on a day of weblogs.com pings.

Building your own search engine with Alexa

December 15th, 2005

As the size of the Web gets bigger and bigger, search engines such as Yahoo! and Google may be too general for building applications that focus on some particular domain of information. To solve this problem, Alexa provides a web search platform that allows people to define their own search engine.

Although you have to pay for the service, but it definitely looks promising. Alexa crawl works over 100 Terabytes of Web content spanning 4 billion pages and 8 million sites, and support a wide variety of types of content from the Web (jpgs, gifs, mp3s, movies. text/html, and even metadata). How does Alexa work?


del.icio.us goes to Yahoo!

December 9th, 2005

Discussion brewing, see tech.memeorandum. Yahoo wins the game, atleast for now. Flickr, and now del.icio.us! Details at ysearchblog, and from Joshua.

Google transit trip planner

December 8th, 2005

Maybe Google Transit will succeed in making public transportation work in the US where others (e.g., Governments) have failed.

“Google Transit Trip Planner enables you to enter the specifics of your trip — where you’re starting, where you’re ending up, what time of day you’d like to leave and/or arrive — then uses all available public transportation schedules and information to plot out the most efficient possible step-by-step itinerary. You can even compare the cost of your trip with the cost of driving the same route!

At the moment we’re only offering this service for the Portland, Oregon metro area, but we plan to expand to cities throughout the United States and around the world.”

It does fill a gap. Public transportation in the US is provided by a mix of national, regional, local and commercial organizations. There’s no one to organize and integrate all of the information, or even to identify who the providers are. The semantics of travel is not overly complicated, there are a finite number of transportation modes, and these share much of the ontology (begin and end times, schedules, costs, waypoints, etc.).

Google base: the killer app?

December 8th, 2005

Google base was announce at Nov 15, 2005. It lets users publishing their information in a Semantic Web way: (i) defining an instance of class; (ii)letting users creating and filling attribute-value pair (value in text though); (iii) letting users add keywords as tag; and (iv) allow bulk upload. I wonder if R. Guha is behind it. It is could be a killer app to web directory services, including classifieds like Craig’s List.

(source: http://www.micropersuasion.com/2005/11/google_to_go_ha.html)
UPDATE: Greg Yardley has evidence from a Microsoft blogger that the software giant is also targeting the online classified market with its new Freemont project.

I’m still hoping some improvement from its beta version:
(1) add the total of items at the front page like below (the number was collected as of today’s snapshot)

blogs (1193)   coupons(697)   course schedules(467)   events and activities(687)
jobs(431077) news and articles(2879) people profiles(46676) products(785)
recipes(20257) reference articles(2280) reviews(5302) services(11827)
vehicles(78716) wanted ads(27131) rentals(31853) comic books(27)

(2) can I browse every item without being bound by 1000 items limit?
(3) what if many people have create many item types?
(4) can it recommend well-used attributes?

The XKCD data died in a blogging accident

December 8th, 2005

The popular XKCD had another Web related comic yesterday, but it trned out to be self-negating.


As was noted on Slashdot:

“As I noted yesterday (and was joined by many others)… in an offhand observation xkcd has singlehandedly changed a small section of the Internet. Changing the results from a Google search for “Died in a Blogging Accident” from 2 to (at this writing) over 7,170 in a little more than 24 hours.”

The number of results are now up to 13.3K. I guess something like the Heisenberg uncertainty principle applies to the Internet, too.