How to win 50,000 euros by compressing human knowledge

Tim Finin, 1:00pm 14 August 2006

Move over Turing Test. Step back Loebner Prize. The latest proposal for a simple test for machine intelligence is the Hutter Prize for Lossless Compression of Human Knowledge.

Hutter medal “Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 100MB of Wikipedia better than your predecessors, you(r compressor) likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs.”

If you do manage to best the current record for compressing the first 100MB of Wikipedia, you won’t necessarily get the full prize. What you will win, besides 15 minutes of fame, will be based on how much better your program has done than the current best program. The payout is similar to that used for the Methuselah mouse prize awarded to researchers who extend the lifespan of a mouse to unprecedented lengths. For the Hutter prize, you need to create a self-extracting archive version of the 100MB file enwik8 of less than 18MB. In particular:

  • Create a Linux or Windows executable archive8.exe of size S, which is less that L, the previous record (currently 18,324,887).
  • When your archive8.exe is executed, it produces a 108MB byte file identical to enwik8.
  • Upon verification, you are eligible for a prize of min(50,000*(1-S/L),500) euros.

The initial 50K € purse is underwritten Marcus Hutter of the Swiss Dalle Molle Institute for Artificial Intelligence.

The use of tests and challenges is always somewhat controversial in AI. Everyone sort of agrees that the Turing test is interesting and says something about what it means to be intelligent. But partly this is for historical reasons and out of respect to Alan Turing, on whose shoulders we all try to stand. The Loebner prize, however, is not taken seriously because it’s built in simplifications and limitations encourage winning though clever hacks that are not likely to generalize. I’m afraid that the Hutter Prize is even father out and is unlikely to be helpful in either advancing our understanding of intelligence or in developing new techniques that are useful in building more intelligent computer programs or systems.

Competitions can be very motivating for researchers and we’ve participated in a number of them. I think a better model for AI competitions are the DARPA Grand Challenge, Text Retrieval Conference challenge tasks, RoboCup and the Trading agent competition. These are all focused on simplified tasks that are very close to real world problems that people want solved. Some of these competitions have resulted in new ideas and algorithms that have already been immediately applied to useful applications.

One reason what I think that the Hutter Prize is not a good AI problem is the requirement that the expanded file be identical to the original. Lossless compression is just not a good model for human memory, I’m afraid. Part of our knowledge and our intelligence is knowing what’s worth remembering and what is not. Another aspect is relating new knowledge and information to what we already know, which might result in a lossy, but more useful and compact, encoding of the information. The Hutter Prize may be useful in encouraging the development of better text compression algorithms, of course, as text compression is useful and probably important, even in a world where storage costs continue to decrease dramatically. I think it would also be stimulating and fun, especially for students.

See the Google discussion group for more information and discussion on this interesting challenge.

