August 29th, 2005
UCSD Physicist Jorge Hirsch has proposed the h-index as a new bibliometric measure of a scholar’s impact based on the number of publications and how often each is cited. See this story in Physics World for an overview. H-index can be defined as follows:
A person who has published N papers has h-index H iff they have H papers each of which has at least H citations and N-H papers with fewer than H citations.
You can easily estimate an author’s h-index using Google Scholar since the results are ranked (more or less) by the number of citations which are shown in the summaries. Try looking for papers authored by Turing. His 15 most cited papers all had at least 17 citations. His 16th most cited paper had only 13 citations. So Alan Turing’s h-index is 15.
This example, of course, shows one problem with basing this on Google Scholar — it only takes into account papers it finds on the Web, a disadvantage for Turing. Another is that Google doesn’t eliminate “self citations” — citations where there is an author common to both the cited and citing papers. Accepting self citations invites gaming the system by always citing all of your earlier publications. Citeseer is a web based system that does eliminate self citations as does ISI‘s the venerable citation database. But CiteSeer doesn’t rank author queries by citation number and also weights them by year. ISI’s coverage for Computer Science is not comprehensive and access costs money. So Google Scholar seems to be the easiest way to play with the h-index idea for CS at present.
Google Scholar and Citeseer automatically discover and index papers of all types — journal, conference, book chapter and even technical reports — unlike traditional citation databases like ISI’s. Should all of these be contribute to a scholarly output metric? I think it’s not unreasonable. A technical report cited by 50 other papers has obviously had impact. Moreover, a paper’s visibility on the Web may become the dominant factor in its significance.
Hirsch argues that h is better than other commonly used single number criteria to measure a scholar’s output. He’s even suggested it could be used for tenure and promotion
Moreover, he goes on to propose that a researcher should be promoted to associate professor when they achieve a h-index of around 12, and to full professor when they reach a h about of 18. (Link)
What counts as a high number will vary across disciplines and even sub-fields within disciplines. Moshe Vardi tells me that Computer Scientists with h>50 are rare and Jeff Ullman’s number in the mid-60s is the highest he’s seen.
Finally, single number measures like this are always just shadows cast on the wall of a cave.
August 26th, 2005
A recent text mining based approach builds the “Semantic Web Encyclopedia of Terms” listing interesting terms about the Semantic Web.
Terms are ranked to show their relevance to the Semantic Web and categorized by a hierarchical taxonomy. Each term comes with “popularity” and “density” rating and a list of relevant terms. The top five are the following.
- Semantic Web – Popularity: 89.62% , Density: 4.69%
- Web Services -Popularity: 32.28% , Density: < 1%
- Tim Berners-Lee – Popularity: 25.28% , Density: < 1%
- World Wide Web -Popularity: 20.32% , Density: < 1%
- Resource Description Popularity: 16.93% , Density: < 1%
August 25th, 2005
I saw a link to Gartner’s Hype Cycle pages and thought I’d see what they said about the Semantic Web technologies. You’ve seen these graphs before — they chart the ups and downs of the ‘visibility’ for an idea or technology over time.
Gartner’s roller coaster ride works like this. An idea first appears after a technological trigger and begins a steep rise to the top — the “peak of inflated expectations”. Not being a hill climber, or maybe just having no brakes, it just as quickly descends into the “trough of disillusionment”. Screaming, I suppose. Sadder, but wiser, it makes a slow gentle climb up the “slope of enlightenment” to reach it’s final place in life, on the “plateau of productivity”. This plateau seems to be only about half as high as the initial peak. Eventually, the idea must disappear of the chart entirely, just as the shepherd’s sling vanished from the WMD hype chart.
The table of content appears to mention the key items on the chart. For the semantic web these are:
|On the rise
||XML Topic Map
|At the peak
||Public Semantic Web
|Sliding into the Trough
|Climbing the slope
|Off the Hype Cycle
Of course, you get what you pay for and this much is free. Who knows what these terms mean and on what basis these predictions are made. I was surprised to see topic maps just starting out though. Maybe it bought another ticket to ride.
August 22nd, 2005
UK Thieves are using Bluetooth phones to scan for and detect Bluetooth enabled laptops left in the trunks of cars. Detective Sargent Al Funge, from Cambridge’s crime investigation unit, said:
“There have been a number of instances of this new technology being used to identify cars which have valuable electronics, including laptops, inside. The thieves are taking advantage of a relatively new technology, and people need to be aware that this is going on. ”
MORE (via Schneier on Security).
August 21st, 2005
Spam blogs (splogs) and spam comment on blogs are a growing problem. Most splogs seem to be hosted by blogger, which made it easy to automatically generate and populate them. Comment spam is a bane to all. Now Google is introducing some features to fight this, including a too to require word verification for comments and a flag as objectionable feature on the blogger Navbar that could be used to slam splogs. However, including the Navbar is optional on blogger blogs and it remains to be seen if such a reputation based scheme will work in this environment outside the lab. Spammers might try to defeat it with false accusations against good blogs if they can manage to have them come from many IP addresses. (via Slashdot)
We think there will be lots of research opportunities here as spammers continue to adapt and evolve their techniques to counter each new anti-spam measure. We’re developing a new project to study and model the structure and content of blogs with just one application being to recognize spogs and comment spam.
August 20th, 2005
We’ve added a service (link) that shows locations of recent visitors to our web site. You can get to by clicking on the ABOUT US link in the header of any page and then click on the link Recent web visitors on the navigation menu on the left
It’s fascinating to see the distribution and to zoom in and try to guess where each visitor is really from. Can you find your own tracks?
How well does it work at localizing your visitors? If 50 visitors from different IP address all from UMBC’s campus access our web site, only one shows up, since all of us will be reduced to a single long/lat, which is in downtown Baltimore. I guess this is where we connect to the backbone. My home machine gets mapped to Arlington Virginia, presumably along with all 10,000 (est) Comcast broadband custmers in the greater Baltimore-DC area. A simple improvement to the gvisit service would be to keep a counter of the number of hits from a given long/lat, so I could see that, say 5 hits came from MIT, 43 from UMBC and 54 from Comcast.
I’ve noticed some mismatches getween the long/lat values and string names — e.g., a location in Maryland that’s said to be in Florida. I’ve also noted some locations that surely must be off the grid — like one at the northnmost bit of Norway. More noise, I’m guessing, but please, correct me if you are reading this and hail from there.
August 18th, 2005
When I typed ‘rdf’ in google search, I got something different! A box has been inserted in the middle of the result page (see here).
Web Results 1 – 10 of about 22,900,000 for rdf [definition]. (0.57 seconds)
Resource Description Framework (RDF) / W3C Semantic Web Activity
Resource Description Framework (RDF) / W3C Semantic Web ActivityOfficial pages from the World Wide Web Consortium, includes the specification, resources and news, and a links collection.
www.w3.org/RDF/ – 51k – Cached – Similar pages
See results for: rdf media
RDF Media | Home
RDF Media is one of Britain’s leading independent television production companies,
responsible for hit shows such as Wife Swap, Faking It, …
RDF Media Ltd (PACT) – London, UK
Send an e-mail to RDF Media Ltd (PACT) Email this company. My email address:.
My company:. I’d like to: … RDF Media Ltd (PACT). Telephone and fax …
www.rdf.co.uk. … CO.UK . . . . Showing 0 – 0 of 0 Items for. terms &
conditions :: Â© 2004 www.webpark.co.uk :: www.rdf.co.uk :: contact us.
The box demostrates query expansion — google send you back the first three results of an highly relevant alternative query. This change, I believe, should be based on a long time user behavior tracking.
You can try more queries categorized as error correction, e.g., flicker , alo. A Similar feature is adding image search in web search result, e.g. person.
August 17th, 2005
Back in May, Donald Trump announced the establishment of Trump University as “a new business education company focused on providing lifelong learning programs for business professionals.”
Trump University will offer a rich mix of products and services, including online e-learning courses, multimedia home study programs, and a series of publications.
Most importantly, Trump University will deliver the experience, knowledge, and wisdom of Donald Trump himself. The real-estate mogul’s personal teachings, experiences and philosophies will be fully integrated into the curriculum. He will be featured in online video, the Trump University website, multimedia learning programs, and much more.
If that were not amazing enough, TU’s Chief Learning Officer (Provost?) is Roger Shank, the famous and sometimes controversial AI scientist.
Roger Schank, professor emeritus and founder of the Institute for Learning Sciences at Northwestern University and one of the world’s top researchers of artificial intelligence, learning theory and cognitive science, has been appointed Chief Learning Officer.
As CLO, Schank will oversee the design and implementation of the e-learning curriculum. Schank, who has also taught at Yale University among other top institutions and is author of some 25 books, is a pioneer and innovator in applying the Learning by Doing philosophy to online education. “People know that they learn by doing precisely because they know that they can learn nothing of value without constant practice” said Schank.
I wonder how one would represent “You’re fired!” using Shank’s conceptual dependency formalism.? I think it would involve an MTRANS of an EXPEL of an ATRANS of … The rest is left an an exercise for the reader.
August 16th, 2005
August 16th, 2005
Several people have pointed out some interesting issues in the methodology used to estimate whether Google or Yahoo has indexed more documents.
Seth Finkelstein points out that using a random two word test search like “alkaloid’s observance” results in 15 hits on Google and none on Yahoo. But not one of the 15 pages Google found are really of interest — they are copies of word lists or spam blogs. It hardly seem fair to call a foul on Yahoo for not indexing useless documents. I’d pay extra for that service!
Eric Glover observes on Dave Farber’s IP list two implicit assumptions in the experiment:
#1: That both Google and Yahoo use the same relevance function to decide which results to include – or there is some way to post process this to compare equally.
#2: That the Yahoo crawler has is biased in a way that is equally probable for results which are returned for the keywords in the study – or at least close.
I wonder if we could ever develop a consensus technique for such an experiment.
August 15th, 2005
Researchers at the University of Georgia’s LSDIS Lab have developed WSDL-S as a lightweight approach for adding semantics to Web services. This is an alternative to other schemes, including OWL-S and WSMO.
They have also released Radient, an Eclipse plugin, that provides a UI for annotating existing WSDL documents into WSDL-S via an OWL Ontology. Radient uses Harry Chen’s Cobra Ontology Viewer plugin.