UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
21 August 2008, 19:37:16 EDT  
Yea, but was it a fair fight?

Yea, but was it a fair fight?

Tim Finin, 1:02pm 16 August 2005

Several people have pointed out some interesting issues in the methodology used to estimate whether Google or Yahoo has indexed more documents.

Seth Finkelstein points out that using a random two word test search like “alkaloid’s observance” results in 15 hits on Google and none on Yahoo. But not one of the 15 pages Google found are really of interest — they are copies of word lists or spam blogs. It hardly seem fair to call a foul on Yahoo for not indexing useless documents. I’d pay extra for that service!

Eric Glover observes on Dave Farber’s IP list two implicit assumptions in the experiment:

#1: That both Google and Yahoo use the same relevance function to decide which results to include - or there is some way to post process this to compare equally.

#2: That the Yahoo crawler has is biased in a way that is equally probable for results which are returned for the keywords in the study - or at least close.

I wonder if we could ever develop a consensus technique for such an experiment.

Leave a Reply






UMBC