Training Examples QA: stackoverflow for NLP and ML

June 30th, 2010

Training Examples QA is a site created by Joseph Turian where “data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!”

It’s a close knock off of the popular stack overflow site and appears to be very well done.

If it catches on in the relevant research communities, it could be a very useful resource. (via LingPipe blog)

Screen shot 2010-06-30 at 1.10.24 PM

Semantic overflow, a collaboratively edited question and answer site for the Semantic Web

June 20th, 2010

Semantic Overflow is a great way for the Semantic Web community to help one another with questions, problems and education. It was started in November 2009 using the Stack Overflow framework hosted by Stackexchange.

It’s still building, with 261 questions submitted and just over 450 registered users, about a third of which have enough reputation to vote. Here’s an example: Ian Davis of Talis asked What is a good elevator pitch for Linked Data? and got 17 answers.

Screen shot 2010-06-20 at 11.42.19 AM

Like the parent stack overflow system, semantic overflow is a blend of a forum, wiki and recommendation site. It lets user ask, tag and answer questions, but also allows those with a sufficient reputation score to vote on and even edit both the questions and community submitted answers.

The tradition way of asking technical questions of a community is the mailing list or a Web based forum. The stack overflow model offers many advantages, so I hope this site continues gain traction.

If you want to monitor the site for new questions, you’ll find the feed of the 30 most recently submitted questions useful.

Journal of Web Semantics 2009 impact factor highest yet: 3.412

June 18th, 2010

Journal of Web Semantics

Thomson Reuters has released their 2009 Journal Citation Report and we are happy to see that the Journal of Web Semantics impact factor has risen to 3.412, the fifth highest among the 116 journals in its category, “Computer Science, Information systems.” This is the highest impact factor score for the JWS to date.

Thomson Reuter’s journal impact factor is a measure of the frequency with which the average article in a journal has been cited in a particular year. The 2009 impact factor is computed as the citations received in 2009 from journals indexed by Thompson to all JWS articles published in 2007 and 2008, divided by the number of JWS “source items” published in 2007 and 2008.

Journal impact factors are not perfect measure of a journal’s quality, but are generally regarded as meaningful and, perhaps, the best objective measure we have. Our high ranking is a very good sign and a testament to the hard work of many people — area editors, editorial board, advisory board, reviewers, authors and especially Silke Werger, who runs the JWS editorial office. We thank everyone for all of the work — it’s made a difference.

Infochimps provides API for their Twitter and Census datasets

June 15th, 2010

Infochips now offers a query API for two interesting datasets: a Twitter collection and US Census data.

The Twitter data covers 500M tweets from 35M users collected between March 2006 and November 2009. The API currently included the following services.

  • Trstrank – a trust metric for Twitter users based on network centrality (see

  • Wordbag – returns the 100 tokens (i.e., words) that a particular Twitter user tweets more often than the average Twitter user.

  • Influencer metrics – replies in/out and retweets in/out for a given user

  • Conversations – find interactions between two users. Currently this just yields direct messages but will include retweets and mentions later. For example, check out conversations between Lady Gaga and Sarah Palin:

Pricing varies with use and ranges from Baboon” (free for 100K calls/month) to “Golden Ape” ($4000/month for 15M call/month).

Free WIFI at Starbucks starting July 1, 2010

June 14th, 2010

free wifi at starbucks Starbucks announced via Twitter that it will beging offering free wireless Internet access to everyone at all of their US locations on July first. You no longer have to be a registered, card carrying Starbucks customer or have an ATT account. See this WAPO note for more information.

FIm classic Metropolis opens tonight in Baltimore

June 11th, 2010

The newly restored complete version of Metropolis opens tonight the Senator Theater for a one-week run. If you like movies about robots, or dystopian futures or just like classic fims that made a difference, it is well worth seeing.

Baltimore is lucky to be one of about ten cities in the US screening it this summer and the only one on the east coast outside of Boston. From the Baltimore Sun

“This “complete” version of Lang’s silent sci-fi extravaganza restores all of its subplots and nearly all of its surging imagery. With Gottfried Huppertz’s soaring romantic score heard in full for the first time, “Metropolis” offers an engulfing audiovisual experience. It leaves you shaking your head in wonder and disbelief.

Those new to the film can sit back and be overwhelmed. Those who’ve seen it have the additional pleasure of watching a puzzle solved before your eyes. Roughly 25 minutes longer than the 2002 version, this print of “Metropolis” uses footage from a 16-millimeter dupe negative found in Buenos Aires to fill in some major bits and pieces — and some minor ones.

You can tell the 16-mm footage from the drop in picture quality. But the effect is thrilling, not jarring. This print combines the ecstasy of seeing a peak accomplishment in pristine form with the frisson movie-lovers get from viewing films as artifacts of their time, aging the way gardens or buildings do.”

Ralph Semmel, CSEE alumnus, named director of JHU APL

June 10th, 2010

UMBC Computer Science alumnus Ralph Semmel (PhD. 1991) was just named as the next director of the Johns Hopkins University Applied Physics Laboratory. APL has a staff of 4,600 and an annual funding level of about $980 million. Dr. Semmel’s dissertation, A knowledge-based approach to automatic query formulation, developed novel techniques to disambiguate conceptual queries against a relational database. His dissertation research was supervised by his mentor, Computer Science Professor James Mayfield. We congratulate Ralph and wish him well in his new position.

Speed up your Web access with namebench

June 5th, 2010

Here’s a quick trick that could significantly speed up your Web surfing. Download and run the open source namebench on your computer. It does a thorough test of your current DNS servers and some other popular global and regional alternatives, produces a good report and recommends which ones you should use.

Here is how namebench describes what it does:

“namebench looks for the fastest DNS (Domain Name System) servers accessible to your computer. You can think of a DNS server as a phone book: When you want to dial a company on the phone, you may have to flip through a phone book by name to find their phone number. On the Internet, when you want to visit “”, a DNS server needs to looks up the correct IP Address for you.

Over the course of loading a single web page, your computer may need to look up a dozen of these addresses. While your Internet provider usually automatically assigns you one of their servers to handle looking up these addresses, there may be others that are significantly faster. namebench finds them.”

Namebench also points out which DNS servers do DNS hijacking — typically by intercepting the error message produced by entering a mistyped URL (e.g., http://umbc.edo/) and redirecting you to a page full of ads and “helpful” search results. Some name severs, like OpenDNS, will also automatically correct some mistyped URLS, e.g., guessing that then you typed http://umbc.edi/ you meant to type (Shades of DWIM!) It’s not dangerous and is a way private DNS services, like OpenDNS, get revenue to support the service and make a profit.

I have been using OpenDNS because it’s the fastest (for me) and don’t mind their NXDOMAIN hijacking. But I learned from namebench that OpenDNS reroutes to That site redirects HTTP GET requests to and then from there onto And Google itself redirects HTTP GET requests for to I’ll admit I am a bit confused by this. I imagine they do this to capture queries sent to Google, which provide very useful information even in the aggregate. OpenDNS says that they are doing this to correct a problem with Google-specific software installed on Dell computers. They do not seem to be doing this for Microsoft’s Bing search engine, which does lend some credence the claim. I plan on digging into this more to fully understand what is going on and why.

Namebench runs on Macs, Windows and UNIX, and has both a command line and graphical user interface. See the namebench FAQ for more information.