It’s popular to ask “What Would Google Do” these days — The Google reports over 7,000 results for the phrase. Of course, it’s not just about Google, which we all use as the archetype for a new Web way of building and thinking about information systems. Asking WWGD can be productive, but only if we know how to implement and exploit the insights the answer gives us. This in turn requires us (well, some of us, anyway) to understand the algorithms, techniques, and software technology that Google and other large scale Web-oriented companies use. We need to ask “How Would Google Do It”.
Michael Nielsen has a nice post on using your laptop to compute PageRank for millions of webpages. His posts reviews PageRank and how to compute it and shows a short, but reasonably efficient, Python program that can easily do a graph with a few million nodes. While not sufficient for many applications, like the Web, there are lots of interesting and significant graphs this small Python program can handle — Wikipedia pages, DBLP publications, RDF namespaces, BGP routers, Twitter followers, etc.
The post is part of a series Nielsen is making on the Google Technology Stack including PageRank, MapReduce, BigTable, and GFS. The posts are a byproduct of a series of weekly lectures he’s giving starting earlier this month in Waterloo. Here’s the way that Nielsen describes the series.
“Part of what makes Google such an amazing engine of innovation is their internal technology stack: a set of powerful proprietary technologies that makes it easy for Google developers to generate and process enormous quantities of data. According to a senior Microsoft developer who moved to Google, Googlers work and think at a higher level of abstraction than do developers at many other companies, including Microsoft: “Google uses Bayesian filtering the way Microsoft uses the if statement” (Credit: Joel Spolsky). This series of posts describes some of the technologies that make this high level of abstraction possible.”
Videos of the first two lectures, Introducion to PageRank and Building our PageRank Intuition) are available online. Nielsen illustrates the concepts and algorithms with well-written Python code and provides exercises to help readers master the material as well as “more challenging and often open-ended problems” which he has worked on but not completely solved.
Nielsen was trained as a as a theoretical Physicist but has shifted his attention to “the development of new tools for scientific collaboration and publication”. As far as I can see, he is offering these as free public lectures out of a desire to share his knowledge and also to help (or maybe force) him to deepen his own understanding of the topics and develop better ways of explaining them. In both cases, it an admirable and inspiring example for us all and appropriate for the holiday season. Merry Christmas!