
Encyclopedia of Life tried to launch yesterday but was immediately crippled by unexpectedly large crowds of visitors. David Shorthouse writes in the EOL Blog (which does still work):
We’re too Popular!
David Shorthouse
February 26th, 2008
You may have noticed that the EOL site has been flaky at best since approximately 12 EST this afternoon. Although we are serving the site from a load balanced cluster of several machines, we are experiencing phenomenal loads.
I just churned through the web logs from web machines in this cluster and there were 5.8M hits in the span of 3 hours. Most of these happened within 1 hour. We were down (and continue to experience intermittent access) for a few hours, then flipped the machines back on. Since then, there were an additional 5.7M hits, totaling 11.5M hits since 9AM this morning and it is now 2:45PM here. Wow!
We are working hard to resolve the issue so stay tuned and please have patience! I’ll post updates here as the day progresses.
I haven’t gotten a chance to see the site yet. My sources told me a month ago that it was done and they were shock testing it. I’m sorry they didn’t have the network infrastructure to handle the massive reaction from the public. On the one hand, it is embarrassing to be caught unprepared like this. On the other hand, it is testimony to the public demand for this kind of information (although one wonders how many journalists it takes to crash a website).
On the positive side, I expect the kinks to be worked out in the next few days. Unlike a failed NASA mission, the show can and will go on. The data are all still there and lessons learned can be applied next time. However, EOL anticipates a total re-engineering and so should expect many more bumpy roads ahead. For example, imagine the possible problems when the site goes semantic and is dynamically drawing information from other sites which are not nearly as well funded (it isn’t clear to me how much of the current implementation is dynamic).
Rod Page admits he is intentionally hypercritical in his review. Much of what he calls for is already planned, though he is concerned at the ability of the team to deliver.
I think the first release of EOL should have, at a minimum, provided at least as much information that I can get from iSpecies and Wikipedia. Other projects, such as Freebase, have pre-populated their databases with content from Wikipedia and other sources. Why didn’t EOL? If the argument is that they want authenticated content, then this doesn’t wash. Their authenticated content is minimal, and waiting for authentication will, in my view, cripple EOL.
EOL’s web site has no mechanism for people to extract data (e.g., RSS feeds, microformats, links to RDF, etc.). It’s intended to be read by humans, not machines. This greatly diminishes its utility.
The real question is how much the issues I’ve raised are things which are easy to fix given time, or whether they reflect underlying problems with the way the project is conceived.
I would point out that yes, the EOL is intended for humans not machines. The original sources from which the data come ought to be machine readable in the first place in order for EOL to get the data. That will be a huge challenge in itself, and a place where EOL can help. EOL eventually will be generating RDF, which itself is not difficult if you know how you want it to look. And then data harvesters will have to sort out which source is the best when the same data appear in multiple places.
Carl Zimmer, who wrote the New York Times blurb, sounds much more optimistic in his blog entry.
I would not be surprised that the interests of communities within biology drive a lot of the growth of the encyclopedia. If the kinks are worked out, it could be a tool that a group of people interested in, say, orchids, could use to store and study their data. Seen that way, it wouldn’t have to hit all 1.8 million species pages to achieve something important.
I could not agree more. The challenge, as I’ve stated before, is engaging those communities and providing tools (perhaps more than just one option!) so that they can not only easily create and moderate the content, but get some payback from it themselves. They don’t need to be on board with the EOL directly, but be producing content that plays nicely with EOL. Note that I have a vested interest. All of the projects I know have a hard time getting their communities on board, and they all have distinct aims and system architectures. We are all poised to see how we can funnel our efforts toward EOL without bankrupting ourselves. Can we use EOL to leverage success on our projects? It isn’t going to be easy, or cheap.
I do notice that the blog contains several observations I could get onto the semantic web by making SPOTs for them. For example, YouTube videos of honey badgers making tools in India, assisted by honeyguides, and allegedly causing problems in Basra, Iraq. Because these are very far removed from the original sources and have poor locality data, they are low quality observations. However, for demonstration purposes, they might be useful.