April 28th, 2009
Guido van Rossum has been blogging about the lack of support for optimizing tail recursion in Python (he’s agin it). His most recent post, Final Words on Tail Calls, includes this paragraph near the end.
‘And here it ends. One other thing I learned is that some in the academic world scornfully refer to Python as “the Basic of the future”. Personally, I rather see that as a badge of honor, and it gives me an opportunity to plug a book of interviews with language designers to which I contributed, side by side with the creators of Basic, C++, Perl, Java, and other academically scorned languages — as well as those of ML and Haskell, I hasten to add. (Apparently the creators of Scheme were too busy arguing whether to say “tail call optimization” or “proper tail recursion.” :-)’
April 21st, 2009
The 4th annual UMBC Digital Entertainment Conference will be held 10-6 Saturday, April 25, 2009 in Lecture hall 2. This event is organized by the UMBC Game Developers Club and is free and open to the public. This year’s conference will feature speakers from local studios who will talk about programming, game design and art in game development, including:
- Justin Boswell, Senior Programmer, Firaxis
- Barry Caudill, Executive Producer, Firaxis
- Dave Inscore, Studio Art Director, Big Huge Games
- Eric Jordan, Programmer, Firaxis
- Martin Kau, Concept Artist, Big Huge Games
- Jon Shafer, Designer/Programmer, Firaxis
You can find more information and RSVP on the FaceBook DEC page.
April 18th, 2009
ReadWriteWeb has a post up on The Web of Data: Creating Machine-Accessible Information that focuses on Linked Open Data.
“In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable, respectively. In this post, we will look at the first of these Webs (of Data) and see how making information accessible to machines will transform how we find information.”
I did find the three ‘Webs’ mentioned in their into — data, services and identity providers — to be interesting. The first two are standard components of the envisioned future Web but their third, a web of identity providers, less so. I am unsure its meant to refer to authentication services and protocols (e.g., oauth) or maybe some kind of named entity recognition services from text. The former is certainly necessary for web services and APIs to work more seamlessly, but doesn’t seem to me to be as significant a problem as developing highly interoperable and integrable Webs of data and services. Of course, I am probably unaware of the subtleties involved in getting this right while maintaing security and appropriate privacy. In any case, I look forward to the articles to follow.
April 16th, 2009
Here’s an interesting paper that will appear in SIGMOD’09 comparing the MapReduce paradigm to parallel conventional databases. The benchmark study described in the paper showed that the parallel database approach performed significantly faster, although it took longer to load the data.
A Comparison of Approaches to Large-Scale Data Analysis, Pavlo, Paulson, Rasin Abadi, DeWitt, Madden, and Stonebraker.
There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system’s performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.
Benchmark details available so others can recreate the trials.