UMBC Global Game Jam teams at work

January 31st, 2009

UMBC is hosting a site for the Global Game Jam (GGJ) which was organized by Professors Marc Olano and Neal McDonald. The GGJ is a 48 hour event in which more than 1,750 people at 53 international sites work to design, test and implement computer games based on a few loose constraints. The event started at 5:00pm Friday (local time) and ends at 5:00pm Sunday.

On the UMBC GAIM blog, Marc reports that

“After three hours of brainstorming, we pulled together four good teams. You can watch our live progress on our feed.”

The final games must be uploaded to the GGJ central site by 3:00pm after which the four local teams will present and demonstrate their games in the game lab (ECS 005) and/or Lecture Hall 5 (ECS building).

Check out the feed to see the 22 UMBC participants at work and view their online chat or see some photos on Flickr. Visitors are welcome to stop by the ECS building Sunday around 4:00pm to see the final presentations and demonstrations (ECS LH5 and/or ECS 005a).

Free live streaming by Ustream
(visit for chat)

Ad placement software with a sense of humor

January 31st, 2009

Maybe I should turn off the Firefox Adblock plugin and enjoy this new form of machine humor.

When ad placement software goes wild

From M. Turk via Time’s Swampland.

Warning: Google thinks every site may harm your computer

January 31st, 2009

The Google has flipped out. Starting a few minutes ago when I try to click on any Google search result, I am shown the Google malware page. The one below was the result when I tried to click through to, the first result for searching for “google”. It is obviously an error in Google’s software and one that surely will be fixed shortly, if it has not been fixed already. Since Google is highly distributed, it’s possible that only some of their sites are in error.

Once you get the “Warning – visiting this web site may harm your computer!” page, the only way to continue on to the page is by manually selecting the text of the URL from the warning page and pasting it into your browser’s URL field.

Through experimentation, the problem exists for the deafult search service as well as image search but not for searchers over blogs, news, video, scholarly papers or shopping.

I suppose this could be the world’s safest CYA disclaimer, but if so they may as well add Do not taunt happy fun ball.

Update: This seems to have been fixed around 10:15am GMT-5.

Update 2: Here is Google’s post about the problem.

Martin Kay: When is a Translation not a Translation, 4:30 Tue 2/3, JHU

January 30th, 2009

Next week the JHU Center for Language and Speech Processing will host a talk by Martin Kay of Stanford University, When is a Translation not a Translation? at 4:30pm Tuesday, 3 February 2009. From the announcement:

“A translation is generally taken to be a text that expresses the same meaning as another text in a different language. But the products of the best translators reflects a different, if more illusive, goal. I will seek a somewhat more adequate characterization of translation as it is actually practiced and discuss its consequences for machine translation.

Martin Kay is a professor of linguistics and computer science at Stanford University. For many years, he was also a research fellow at the Xerox Palo Alto Research Center. He made a number of fundamental contributions to computational linguistics, including chart parsing, unification grammar, and applications of finite-state technology, notably in phonology. He has been an intermittent worker on, and skeptical observer of, machine translation since 1958.”

For a preview of what he will probably talk about, you might look at a paper on Professor Kay’s web site that he describes as “some unfinished musings on the nature of translation“.

This a chance to hear someone who has made many important contributions to several areas of computational linguistics and computer science over a long career.

Extracting Wikipedia infobox values from text

January 27th, 2009

Text Analysis Conference This year’s Text Analysis Conference (TAC) has an interesting track focused on processing text to populate Wikipedia infoboxes, both for existing entities with missing values as well as newly discovered entities.

TAC has been run by the US National Institute of Standards and Technology (NIST) to to encourage research in natural language processing and related applications. As in the NIST sponsored MUC, TREC and ACE workshops, this is done by by providing a large test collection, common evaluation procedures, and a forum for organizations to share their results. The first TAC was held this year and included 65 teams from 20 countries who participated in three tracks: question answering, summarization and recognizing textual entailments.

TAC 2009 will include a new track on Knowledge Base Population coordinated by Paul McNamee of the Johns Hopkins University Human Language Technology Center of Excellence.

“The goal of the new Knowledge Base Population track is to augment an existing knowledge representation with information about entities that is discovered from a collection of documents. A snapshot of Wikipedia infoboxes will be used as the original knowledge source, and participants will be expected to fill in empty slots for entities that do exist, add missing entities and their learnable attributes, and provide links between entities and references to text supporting extracted information. The KBP task lies at the intersection of Question Answering and Information Extraction and is expected to be of particular interest to groups that have participated in ACE or TREC QA.”

This is an exciting task and doing well in it will require a a mixture of language processing, knowledge-based processing and (probably) machine learning.

The TAC 2009 workshop will be co-located with TREC and held 16-17 November in Gaithersburg, MD. If you are interested in participating, you should register by March 3.

NA Computational Linguistics Olympiad at UMBC

January 27th, 2009

NSF has a press release out on the upcoming North American Computational Linguistics Olympiad. UMBC is hosting a site for the first rouund, which will take place on February 4. You can still sign up by February 3 is space is available.

“Early next month, high school students from across the United States and Canada will begin the first rounds of the North American Computational Linguistics Olympiad (NACLO). Although the competition aims to identify students to represent the United States at the 2009 International Linguistics Olympiad, it is also a chance for young people to explore their interests in linguistics, math or computer science and pick up some useful new skills.”

NSF has produced a nice video for NACLO that explains computational linguistics, NACLO and their relevance today.

Variants of Semantic Web Languages in the Real World

January 26th, 2009

WWW2009 will include a workshop on Semantics for the Rest of Us: Variants of Semantic Web Languages in the Real World on 20 April 2009 in Madrid, Spain.

“The Semantic Web is a broad vision of the future of personal computing, emphasizing the use of sophisticated knowledge representation as the basis for end-user applications’ data modeling and management needs. Key to the pervasive adoption of Semantic Web technologies is a good set of fundamental “building blocks” – the most important of these are representation languages themselves. W3C’s standard languages for the Semantic Web, RDF and OWL, have been around for several years; instead of strict standards compliance, we see “variants” of these languages emerge in applications, often tailored to a particular application’s needs. These variants are often either subsets of OWL or supersets of RDF, typically with fragments OWL added. Extensions based on rules, such as SWRL and N3 logic, have been developed as well as enhancements to the SPARQL query language and protocol.
    In this workshop we will explore the landscape of RDF, OWL and SPARQL variants, specifically from the standpoint of “real-world semantics”. Are there commonalities in these variants that might suggest new standards or new versions of the existing standards? We hope to identify common requirements of applications consuming Semantic Web data and understand the pros and cons of a strictly formal approach to modeling data versus a “scruffier” approach where semantics are based on application requirements and implementation restrictions.”

Full papers and position papers should be submitted by 15 February.

Less is more breaks many business models

January 26th, 2009

The NYT has a story on technology downsizing, $200 Laptops Break a Business Model . It leads with an anecdote, a common and effective hook.

“The global credit crisis may have caused the decline in consumer and business spending that is assaulting the giants of high tech. But as the dominant technology companies try to emerge from this slump, they may find themselves blaming people like David Title just as much as they blame Wall Street. Mr. Title, a 35-year-old new-media manager at a film production company in New York, has dropped his cable subscription and moved to watching most of his television online — free. While shopping for a new laptop for his girlfriend recently, he sidestepped more expensive full-featured computers and picked a bare-bones, $200 Asus EeePC laptop, also known as a netbook.”

While I’m not sure about the $200 laptop — I paid $400 for what I considered a usable Asus eee last year — this trend is real. My sense is that we are all looking around and asking “Do I really need this” and answering, in many cases with a negative. Whether this is good or bad for the economy I don’t know. But it is good for the soul. Less is more seems to be an idea that takes hold on a regular basis, probably as a natural corrective action. One that seems very appropriate now.

How to choose the right chart for your data

January 25th, 2009

There are lots of good systems, including excel and other spreadsheet tools, that can visualize your data in various kinds of graphs. it can sometimes by a little daunting, however, to figure out which kind of chart to use. The version of excel running on my laptop, for example, asks me to choose from more than 70 kinds of charts. Of course, many of the variations are obviously stylistic — 2D vs 3D bar charts — but there are still a lot of options.

A link to a great data visualization cheat sheet on How to choose a chart is doing well on Hacker News today. The graphic was created by Andrew Abela and posted on his blog in Choosing a good chart over three years ago.

“Here’s something we came up with to help you consider which chart to use. It was inspired by the table in Gene Zelazny’s classic work Saying It With Charts (p. 27 in the 4th. ed)”

How to choose the right chart for your data

Abela developed this aid as part of his Extreme Presentation method for “designing presentations that drive action”. Viewing his Extreme Presentation blog you can find versions of this chart aide that have been translated into other languages

The White House blog

January 20th, 2009

The White House blog went live today with a post by Macon Phillips, Change has come to

“Welcome to the new I’m Macon Phillips, the Director of New Media for the White House and one of the people who will be contributing to the blog.

This is an interesting, albeit minor, aspect of an historic event that everyone hopes will lead to a better world.

The feed is in an odd place, however. If you put the blog’s address into Google Reader, for example, it can’t find the feed, which is at Bloglines, however, does manage to find the feed given the Blog’s URL.

When will video dominate text on the Web?

January 18th, 2009

Information on the Web comes in many forms, including text, images, services, data, games, and video. I’ve always considered text to be the essential type, possibly because it was the first, but also because so much of our Web experience has been shaped by search engines, which still operate mostly on text. But just as television and film dominate books and other forms of text in popular culture, maybe video-oriented modalities will become the preferred form of Web content.

Today’s New York Times has an article, At First, Funny Videos. Now, a Reference Tool, about how many search for information on YouTube first and turn to text search engines only when their YouTube results are inadequate.

“FACED with writing a school report on an Australian animal, Tyler Kennedy began where many students begin these days: by searching the Internet. But Tyler didn’t use Google or Yahoo. He searched for information about the platypus on YouTube.

“I found some videos that gave me pretty good information about how it mates, how it survives, what it eats,” Tyler said. Similarly, when Tyler gets stuck on one of his favorite games on the Wii, he searches YouTube for tips on how to move forward. And when he wants to explore the ins and outs of collecting Bakugan Battle Brawlers cards, which are linked to a Japanese anime television series, he goes to YouTube again.

While he favors YouTube for searches, he said he also turns to Google from time to time. “When they don’t have really good results on YouTube, then I use Google,” said Tyler, who is 9 and lives in Alameda. Calif.

The article reports that the number of YouTube searches now recently exceeded those on Yahoo, which had been number two.

“In November, Americans conducted nearly 2.8 billion searches on YouTube, about 200 million more than on Yahoo, according to comScore.”

You can see this trend in comScore’s December 2008 Search Engine Rankings report.

It’s hard to say where this is going. Video is great for some kinds of information (e.g, demonstrations, events) and less good for others (e.g., recipes, careful arguments). We can easily link information in text to related information, but can’t (yet) for videos. We can more easily write programs to process text and even extract semantic information from it.

But I have a feeling that nine year old Tyler Kennedy is a sign of things to come.

US House stimulus plan: NSF += $3B

January 15th, 2009

The CRA reports that the US science and technology research community may get it’s own little bailout. The House Appropriations Committee released details of their American Recovery and Reinvestment economic stimulus package that includes funds for scientific research.

NSF is slated to get $3B in new money:

“including $2 billion for expanding employment opportunities in fundamental science and engineering to meet environmental challenges and to improve global economic competitiveness, $400 million to build major research facilities that perform cutting edge science, $300 million for major research equipment shared by institutions of higher education and other scientists, $200 million to repair and modernize science and engineering research facilities at the nation’s institutions of higher education and other science labs, and $100 million is also included to improve instruction in science, math and engineering”

The plan also calls for new research money for NIH, DOE, NASA, NIST and other government organizations as well as $6B for broadband deployment.

While this is not large as bailouts go, we must keep in mind it was done without a crisis brought about by the rampant use of research breakthrough default swap instruments or scholarly paper citation pyramid schemes. Maybe we should have gotten MBAs.

Update 1/16: The CRA policy blog has some more details on how the funds will be allocated within some of the agencies.