A new measure of a researcher’s impact

August 29th, 2005

UCSD Physicist Jorge Hirsch has proposed the h-index as a new bibliometric measure of a scholar’s impact based on the number of publications and how often each is cited. See this story in Physics World for an overview. H-index can be defined as follows:

A person who has published N papers has h-index H iff they have H papers each of which has at least H citations and N-H papers with fewer than H citations.

You can easily estimate an author’s h-index using Google Scholar since the results are ranked (more or less) by the number of citations which are shown in the summaries. Try looking for papers authored by Turing. His 15 most cited papers all had at least 17 citations. His 16th most cited paper had only 13 citations. So Alan Turing’s h-index is 15.

This example, of course, shows one problem with basing this on Google Scholar — it only takes into account papers it finds on the Web, a disadvantage for Turing. Another is that Google doesn’t eliminate “self citations” — citations where there is an author common to both the cited and citing papers. Accepting self citations invites gaming the system by always citing all of your earlier publications. Citeseer is a web based system that does eliminate self citations as does ISI‘s the venerable citation database. But CiteSeer doesn’t rank author queries by citation number and also weights them by year. ISI’s coverage for Computer Science is not comprehensive and access costs money. So Google Scholar seems to be the easiest way to play with the h-index idea for CS at present.

Google Scholar and Citeseer automatically discover and index papers of all types — journal, conference, book chapter and even technical reports — unlike traditional citation databases like ISI’s. Should all of these be contribute to a scholarly output metric? I think it’s not unreasonable. A technical report cited by 50 other papers has obviously had impact. Moreover, a paper’s visibility on the Web may become the dominant factor in its significance.

Hirsch argues that h is better than other commonly used single number criteria to measure a scholar’s output. He’s even suggested it could be used for tenure and promotion

Moreover, he goes on to propose that a researcher should be promoted to associate professor when they achieve a h-index of around 12, and to full professor when they reach a h about of 18. (Link)

What counts as a high number will vary across disciplines and even sub-fields within disciplines. Moshe Vardi tells me that Computer Scientists with h>50 are rare and Jeff Ullman’s number in the mid-60s is the highest he’s seen.

Finally, single number measures like this are always just shadows cast on the wall of a cave.