 | UMBC eBiquity Blog 
Tim Finin, 11:08am 30 January 2009
Next week the JHU Center for Language and Speech Processing will host a talk by Martin Kay of Stanford University, When is a Translation not a Translation? at 4:30pm Tuesday, 3 February 2009. From the announcement:
“A translation is generally taken to be a text that expresses the same meaning as another text in a different language. But the products of the best translators reflects a different, if more illusive, goal. I will seek a somewhat more adequate characterization of translation as it is actually practiced and discuss its consequences for machine translation.
Martin Kay is a professor of linguistics and computer science at Stanford University. For many years, he was also a research fellow at the Xerox Palo Alto Research Center. He made a number of fundamental contributions to computational linguistics, including chart parsing, unification grammar, and applications of finite-state technology, notably in phonology. He has been an intermittent worker on, and skeptical observer of, machine translation since 1958.”
For a preview of what he will probably talk about, you might look at a paper on Professor Kay’s web site that he describes as “some unfinished musings on the nature of translation“.
This a chance to hear someone who has made many important contributions to several areas of computational linguistics and computer science over a long career.
Categories: NLP,
Tags: NLP,
Related posts: • McNamee: Textual Representations for Corpus-Based Bilingual Retrieval, 9am Mon 11/24; • DARPA speech-to-speech research; • Call for ISWC 2008 tutorial proposals; Comments: none
Tim Finin, 3:39pm 27 January 2009
This year’s Text Analysis Conference (TAC) has an interesting track focused on processing text to populate Wikipedia infoboxes, both for existing entities with missing values as well as newly discovered entities.
TAC has been run by the US National Institute of Standards and Technology (NIST) to to encourage research in natural language processing and related applications. As in the NIST sponsored MUC, TREC and ACE workshops, this is done by by providing a large test collection, common evaluation procedures, and a forum for organizations to share their results. The first TAC was held this year and included 65 teams from 20 countries who participated in three tracks: question answering, summarization and recognizing textual entailments.
TAC 2009 will include a new track on Knowledge Base Population coordinated by Paul McNamee of the Johns Hopkins University Human Language Technology Center of Excellence.
“The goal of the new Knowledge Base Population track is to augment an existing knowledge representation with information about entities that is discovered from a collection of documents. A snapshot of Wikipedia infoboxes will be used as the original knowledge source, and participants will be expected to fill in empty slots for entities that do exist, add missing entities and their learnable attributes, and provide links between entities and references to text supporting extracted information. The KBP task lies at the intersection of Question Answering and Information Extraction and is expected to be of particular interest to groups that have participated in ACE or TREC QA.”
This is an exciting task and doing well in it will require a a mixture of language processing, knowledge-based processing and (probably) machine learning.
The TAC 2009 workshop will be co-located with TREC and held 16-17 November in Gaithersburg, MD. If you are interested in participating, you should register by March 3.
Categories: NLP, Semantic Web, Social media, Wikipedia,
,
Related posts: • Wikipedia infobox template coherence; • Google’s HTML Statistics; • W3C publishes working drafts based on “Simple Knowledge Organisation System”; Comments: one
Tim Finin, 8:49am 27 January 2009
NSF has a press release out on the upcoming North American Computational Linguistics Olympiad. UMBC is hosting a site for the first rouund, which will take place on February 4. You can still sign up by February 3 is space is available.
“Early next month, high school students from across the United States and Canada will begin the first rounds of the North American Computational Linguistics Olympiad (NACLO). Although the competition aims to identify students to represent the United States at the 2009 International Linguistics Olympiad, it is also a chance for young people to explore their interests in linguistics, math or computer science and pick up some useful new skills.”
NSF has produced a nice video for NACLO that explains computational linguistics, NACLO and their relevance today.
Categories: GENERAL,
,
Related posts: • N. American Computational Linguistics Olympiad at UMBC; • Martin Kay: When is a Translation not a Translation, 4:30 Tue 2/3, JHU; • AAAI Sympoium on Computational Approaches to Analyzing Weblogs; Comments: none
Tim Finin, 11:13pm 26 January 2009
WWW2009 will include a workshop on Semantics for the Rest of Us: Variants of Semantic Web Languages in the Real World on 20 April 2009 in Madrid, Spain.
“The Semantic Web is a broad vision of the future of personal computing, emphasizing the use of sophisticated knowledge representation as the basis for end-user applications’ data modeling and management needs. Key to the pervasive adoption of Semantic Web technologies is a good set of fundamental “building blocks” – the most important of these are representation languages themselves. W3C’s standard languages for the Semantic Web, RDF and OWL, have been around for several years; instead of strict standards compliance, we see “variants” of these languages emerge in applications, often tailored to a particular application’s needs. These variants are often either subsets of OWL or supersets of RDF, typically with fragments OWL added. Extensions based on rules, such as SWRL and N3 logic, have been developed as well as enhancements to the SPARQL query language and protocol.
In this workshop we will explore the landscape of RDF, OWL and SPARQL variants, specifically from the standpoint of “real-world semantics”. Are there commonalities in these variants that might suggest new standards or new versions of the existing standards? We hope to identify common requirements of applications consuming Semantic Web data and understand the pros and cons of a strictly formal approach to modeling data versus a “scruffier” approach where semantics are based on application requirements and implementation restrictions.”
Full papers and position papers should be submitted by 15 February.
Categories: Semantic Web,
,
Related posts: • CFP: Semantics for the rest of us Workshop at 8th Int. Semantic Web Conference; • List of Semantic web tools; • Shirky: The Semantic Web, Syllogism, and Worldview; Comments: none
Tim Finin, 12:16am 26 January 2009
The NYT has a story on technology downsizing, $200 Laptops Break a Business Model . It leads with an anecdote, a common and effective hook.
“The global credit crisis may have caused the decline in consumer and business spending that is assaulting the giants of high tech. But as the dominant technology companies try to emerge from this slump, they may find themselves blaming people like David Title just as much as they blame Wall Street. Mr. Title, a 35-year-old new-media manager at a film production company in New York, has dropped his cable subscription and moved to watching most of his television online — free. While shopping for a new laptop for his girlfriend recently, he sidestepped more expensive full-featured computers and picked a bare-bones, $200 Asus EeePC laptop, also known as a netbook.”
While I’m not sure about the $200 laptop — I paid $400 for what I considered a usable Asus eee last year — this trend is real. My sense is that we are all looking around and asking “Do I really need this” and answering, in many cases with a negative. Whether this is good or bad for the economy I don’t know. But it is good for the soul. Less is more seems to be an idea that takes hold on a regular basis, probably as a natural corrective action. One that seems very appropriate now.
Categories: GENERAL,
,
Related posts: • Models? We don’t need no stinking models!; • Freebase’s data and knowledge models; • BlogWise and Google Maps; Comments: one
Tim Finin, 10:01am 25 January 2009
There are lots of good systems, including excel and other spreadsheet tools, that can visualize your data in various kinds of graphs. it can sometimes by a little daunting, however, to figure out which kind of chart to use. The version of excel running on my laptop, for example, asks me to choose from more than 70 kinds of charts. Of course, many of the variations are obviously stylistic — 2D vs 3D bar charts — but there are still a lot of options.
A link to a great data visualization cheat sheet on How to choose a chart is doing well on Hacker News today. The graphic was created by Andrew Abela and posted on his blog in Choosing a good chart over three years ago.
“Here’s something we came up with to help you consider which chart to use. It was inspired by the table in Gene Zelazny’s classic work Saying It With Charts (p. 27 in the 4th. ed)”

Abela developed this aid as part of his Extreme Presentation method for “designing presentations that drive action”. Viewing his Extreme Presentation blog you can find versions of this chart aide that have been translated into other languages
Categories: GENERAL,
Tags: data; visualization,
Related posts: • Prisoners Dilemma and the Golden Balls game show; • The Semantic web’s place on the Hype Cycle; • Sifry on the state of the Blogosphere; Comments: none
Tim Finin, 12:48pm 20 January 2009
The White House blog went live today with a post by Macon Phillips, Change has come to WhiteHouse.gov.
“Welcome to the new WhiteHouse.gov. I’m Macon Phillips, the Director of New Media for the White House and one of the people who will be contributing to the blog.
…
This is an interesting, albeit minor, aspect of an historic event that everyone hopes will lead to a better world.
The feed is in an odd place, however. If you put the blog’s address into Google Reader, for example, it can’t find the feed, which is at http://www.whitehouse.gov/feed/blog. Bloglines, however, does manage to find the feed given the Blog’s URL.
Categories: GENERAL,
Tags: history; White House,
Related posts: • US targets NSF and NIH budgets; • US House stimulus plan: NSF += $3B; • On the importance of metadata; Comments: one
Tim Finin, 8:31pm 18 January 2009
Information on the Web comes in many forms, including text, images, services, data, games, and video. I’ve always considered text to be the essential type, possibly because it was the first, but also because so much of our Web experience has been shaped by search engines, which still operate mostly on text. But just as television and film dominate books and other forms of text in popular culture, maybe video-oriented modalities will become the preferred form of Web content.
Today’s New York Times has an article, At First, Funny Videos. Now, a Reference Tool, about how many search for information on YouTube first and turn to text search engines only when their YouTube results are inadequate.
“FACED with writing a school report on an Australian animal, Tyler Kennedy began where many students begin these days: by searching the Internet. But Tyler didn’t use Google or Yahoo. He searched for information about the platypus on YouTube.
“I found some videos that gave me pretty good information about how it mates, how it survives, what it eats,” Tyler said. Similarly, when Tyler gets stuck on one of his favorite games on the Wii, he searches YouTube for tips on how to move forward. And when he wants to explore the ins and outs of collecting Bakugan Battle Brawlers cards, which are linked to a Japanese anime television series, he goes to YouTube again.
While he favors YouTube for searches, he said he also turns to Google from time to time. “When they don’t have really good results on YouTube, then I use Google,” said Tyler, who is 9 and lives in Alameda. Calif.
The article reports that the number of YouTube searches now recently exceeded those on Yahoo, which had been number two.
“In November, Americans conducted nearly 2.8 billion searches on YouTube, about 200 million more than on Yahoo, according to comScore.”
You can see this trend in comScore’s December 2008 Search Engine Rankings report.
It’s hard to say where this is going. Video is great for some kinds of information (e.g, demonstrations, events) and less good for others (e.g., recipes, careful arguments). We can easily link information in text to related information, but can’t (yet) for videos. We can more easily write programs to process text and even extract semantic information from it.
But I have a feeling that nine year old Tyler Kennedy is a sign of things to come.
Categories: Google, Web, sEARCH,
,
Related posts: • For teens, social media is not technology, it’s just life; • Video game consoles with cutting-edge technology; • That Funny Steve Ballmer Video Thing; Comments: 4
Tim Finin, 6:15pm 15 January 2009
The CRA reports that the US science and technology research community may get it’s own little bailout. The House Appropriations Committee released details of their American Recovery and Reinvestment economic stimulus package that includes funds for scientific research.
NSF is slated to get $3B in new money:
“including $2 billion for expanding employment opportunities in fundamental science and engineering to meet environmental challenges and to improve global economic competitiveness, $400 million to build major research facilities that perform cutting edge science, $300 million for major research equipment shared by institutions of higher education and other scientists, $200 million to repair and modernize science and engineering research facilities at the nation’s institutions of higher education and other science labs, and $100 million is also included to improve instruction in science, math and engineering”
The plan also calls for new research money for NIH, DOE, NASA, NIST and other government organizations as well as $6B for broadband deployment.
While this is not large as bailouts go, we must keep in mind it was done without a crisis brought about by the rampant use of research breakthrough default swap instruments or scholarly paper citation pyramid schemes. Maybe we should have gotten MBAs.
Update 1/16: The CRA policy blog has some more details on how the funds will be allocated within some of the agencies.
Categories: Computing Research, Funding,
,
Related posts: • Senate plan: less stimulus for NSF, NIST, other science agencies; • Stimulus Watch: propose and vote on shovel ready projects; • NSF and science increments survive stimulus conference; Comments: 4
Tim Finin, 3:12pm 15 January 2009
mc schraefel and Lloyd Rutledge are editing a special issue of the Journal of Web Semantics on “Exploring New Interaction Designs Made Possible by the Semantic Web“. The call for submissions described the topic this way.
“In this special issue of the Journal of Web Semantics we seek papers that look at the challenges and innovate possible solutions for everyday computer users to be able to produce, publish, integrate, represent and share, on demand, information from and to heterogeneous data sources. Challenges touch on interface designs to support end-user programming for discovery and manipulation of such sources, visualization and navigation approaches for capturing, gathering and displaying and annotating data from multiple sources, and user-oriented tools to support both data publication and data exchange. The common thread among accepted papers will be their focus on such user interaction designs/solutions oriented linked web of data challenges. Papers are expected to be motivated by a user focus and methods evaluated in terms of usability to support approaches pursued.” 
In addition to full length research papers, they will also consider submissions of short (4-6 page) demonstration papers with evaluations of new tools that address any of the above challenges and brief (1-2 page) forward-looking, speculative papers addressing challenges. Submissions are due by 20 April 2009. Accepted papers are expected to appear online in preprint form in Summer 2009, online in final form by the end of 2009 and in print in 2010.
Categories: Semantic Web,
Tags: JWS; the Journal of Web Semantics,
Related posts: • CFP: JWS special issue on semantic search; • JWS special issue on The Web of Data; • Follow the Journal of Web Semantics on facebook and twitter; Comments: none
Tim Finin, 11:14am 13 January 2009
Elsevier has made the January 2009 Journal of Web Semantics special issue on the Semantic Web and Policy our new sample issue, which means that its paper are freely available online until a new sample issue is selected. The special issue editors, Lalana Kagal, Tim Berners-Lee and James Hendler wrote in the introduction:
“As Semantic Web technologies mature and become more accepted by researchers and developers alike, the widespread growth of the Semantic Web seems inevitable. However, this growth is currently hampered by the lack of well-defined security protocols and specifications. Though the Web does include fairly robust security mechanisms, they do not translate appropriately to the Semantic Web as they do not support autonomous machine access to data and resources and usually require some kind of human input. Also, the ease of retrieval and aggregation of distributed information made possible by the Semantic Web raises privacy questions as it is not always possible to prevent misuse of sensitive information. In order to realize it’s full potential as a powerful distributed model for publishing, utilizing, and extending information, it is important to develop security and privacy mechanisms for the Semantic Web. Policy frameworks built around machine-understandable policy languages, with their promise of flexibility, expressivity and automatable enforcement appear to be the obvious choice.
…
It is clear that these two technologies – Semantic Web and Policy – complement each other and together will give rise to security infrastructures that provide more flexible management, are able to accommodate heterogeneous information, have improved communication, and are able to dynamically adapt to variations in the environment. These infrastructures could be used for a wide spectrum of applications ranging from network management, quality of information, to security, privacy and trust. This special issue of the Journal of Web Semantics is focused on the impact of Semantic Web technologies on policy management, and the specification, analysis and application of these Semantic Web-based policy frameworks.”
In addition to the editors’ Introduction, the special issue includes five papers:
Categories: Policy, Semantic Web,
Tags: JWS; Web Semantics,
Related posts: • CFP: JWS special issue on semantic search; • This Thanksgiving Do Some Blog Reading; • JWS special issue on The Web of Data; Comments: one
Tim Finin, 5:14pm 7 January 2009
If you are a high school or middle school student who is interested in
computers and also in languages, you should consider participating in the 2009 North American Computational Linguistics Olympiad (NACLO). This might be the first step on a path that could lead to your helping to create the next Google!
NACLO is a competition for middle-school and high-school students focused on solving problems involving linguistics and computational linguistics. WOrking the problems only requires keen analytical ability and good problem-solving skills — no prior background in linguistics, foreign languages or computer science is required.
NACLO consists of two rounds — an initial round on February 4 open to all students and a subsequent invitational round on March 11 for contestants who have advanced from the first. Winners of the second round will be invited to participate in the International Linguistics Olympiad. Last year, two US teams went to Bulgaria to compete in the sixth International Linguistics Olympiad and gold medals in individual and team events.
Support for NACLO is provided by Google, the Associaton for Computational Linguistics, and the National Science Foundation, which said in an August press release :
“Aside from being a fun intellectual challenge, the Olympiad mimics the skills used by researchers and scholars in the field of computational linguistics, which is increasingly important for the United States and other countries. Using computational linguistics, these experts can develop automated technologies such as translation software that cut down on the time and training needed to work with other languages, or software that automatically produces informative English summaries of documents in other languages or answer questions about information in these documents. In an increasingly global economy where businesses operate across borders and languages, having a strong pool of computational linguists is a competitive advantage. With threats emerging from different parts of the world, developing computational linguistics skills has also been identified as vital to national defense in the 21st century.” (src)
Students can participate at the NACLO site at UMBC, which is sponsored by the UMBC Institute for Language in Information Technology. Check out their poster and sample problem If you like this kind of puzzle and others like it, sign up to be part this exciting competition.
Students should register online by January 20. Late registrations may be accepted up to February 3 if space is available. The UMBC NACLO event will take place on Wednesday February 4 in room 312 of the University Center. For more information, contact one of the local organizers: Professors Marjorie McShane (marge@umbc.edu), Sergei Nirenburg (sergei@umbc.edu) and Margaret A. Russell (margaret.a.russell@gmail.com).
Categories: AI, NLP, Semantic Web, UMBC,
Tags: computational linguistics; naclo,
Related posts: • NA Computational Linguistics Olympiad at UMBC; • Martin Kay: When is a Translation not a Translation, 4:30 Tue 2/3, JHU; • AAAI Sympoium on Computational Approaches to Analyzing Weblogs; Comments: 2
|  |
|  |