Barny Pell video: POWERSET – Natural Language and the Semantic Web

November 17th, 2007

Barny Pell of Powerset gave one of the keynote talks at the (joint) 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference (ISWC 2007) held in Busan, South Korea this past week. His title was POWERSET – Natural Language and the Semantic Web“.

The talk was captured on video [1] and put online by, a service run by the Jozef Stefan Institute, which records, edits and publishes videos of computer science talks, lectures and tutorials

It was a good talk and was the first time that I’ve had a chance to hear some of the technical details underlying Powerset’s approach. For those who don’t follow such things, Powerset is building a Web search engine that uses NLP technology out of Xerox Parc to process both the text on Web pages and in search queries. The result is a system that should perform significantly better for many retrieval tasks.

The talk had *some* details but left lots of questions. On the whole, I was impressed by what I saw. In particular, there were no numbers on performance metrics, either for the query system or scalability issues. Of course this is not surprising given the commercial nature of the effort.

Videos from ISWC 2007 online

November 17th, 2007 was recording talks at the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference (ISWC 2007) held in Busan, South Korea this week. They already have up two of the three keynote talks and one tutorial.

This is a great service run by the Jozef Stefan Institute. The videos of computer science talks, lectures and tutorials are professionally recorded and edited and provide a wonderful resource to the international computer science community.

The ISWC 2007 talks that have already been put online are:

See Videolecture’s ISWC 2007 page for the complete set of videos. I expect to see many more from ISWC 2007 in the coming weeks.

ISWC 2009 to be held in the Washington DC area

November 12th, 2007

At ISWC 2007‘s opening session, Rudi Studer (Semantic Web Science Association chair) announced that the 2009 International Semantic Web Conference will be held in the Washington DC area. The details are not yet finalized but it is expected that the conference will be at the end of October or beginning of November, 2009.

ISWC 2008 site available

November 10th, 2007

The Seventh International Semantic Web Conference (ISWC 2008) will be held 26-30 October 2008 in Karlsruhe Germany. The initial website is up and includes the preliminary ISWC 2008 call for papers. The site will be updated as ISWC 2008 approaches and will feature all the information you need to make your visit to Karlsruhe a complete success.

Cleverset recomendation engine uses statistical relational learning

November 8th, 2007

Technology review has a short article, A Better Recommendation Engine, on the Seattle company Cleverset that offers recommendation services for ecommerce.

“Now a Seattle-based startup called Cleverset thinks it has the secret to the next-generation recommendation system: a type of computer modeling found mainly in artificial-intelligence research labs. Cleverset’s system weighs the importance of the relationship among individual shoppers, their behavior on the site, the behavior of similar shoppers, and external factors such as seasons, holidays, and events like the Super Bowl. Using these ever-changing relationships, Cleverset’s system serves up products that are statistically likely to match what the customer will find interesting.” (link)

Cleverset was founded in 2000 by Bruce D’Ambrosio of Oregon State University. Their approach is based on statistical relational learning.

“Cleverset uses an approach called statistical relational modeling, developed in the past decade, in which each piece of information in a data set is linked together based on its relationship to every other piece of information. This contrasts with the previous view of looking at data as if in an Excel spreadsheet, where everything carries an equal weight.” (link)

AskWiki uses Wikipedia for semantic search

November 3rd, 2007

AskWiki is a service that answers simple questions with passages from Wikipedia. It is the product of a collaboration between the Wikimedia Foundation and AskMeNow, a company that uses NLP technology to answer simple questions sent via mobile phones. AskWiki classifies what it is doing as a kind of ‘semantic search, which it defines (citing Wikipedia) as:

“Semantic Search attempts to augment and improve traditional Research Searches by leveraging XML data from semantic networks to disambiguate semantic search queries and web text in order to increase relevancy of results.” (link)

After playing with AskWiki a bit I was impressed with many of its answers and disappointed with others. About par for the course with this kind of system. But probing AskWiki’s performance and trying to reverse engineer what they are doing is a lot of fun. Mostly the system appears to work by selecting one sentence-level passage from Wikipedia that is the best match as an answer to the question. The question and sentences are probably pre-processed in various ways to improve the process. The match is probably based on the relevance as determined by the underlying IR engine.

What makes AskWiki perform so well, I’d guess, is the breadth, style and quality of Wikipedia’s content.

For more information, see AskWiki: Wikipedia Knows What You’re Thinking or, better yet, sit down with AskWiki and talk. Here’s a transcript of my session with it along with some comments.

Q: how high is the empire state building?
A: Antenna spire: 449 m (1,472 ft)
From: Empire State Building
Comment: This answer was extracted from the Infobox.

Q: empire state building height
A: The Empire State Building rises to 381 m (1,250 feet) at the 102nd floor, and including the 203ft pinnacle its full height reaches 443 m (1,453 ft and 8 9/16th in).
From: Empire State Building
Comment: This sentence is from the article’s text, and seems better than the first one.

Q: How heavy is a bowling ball?
A: Most people can easily buy a bowling ball that is at least 1 to 2 pounds heavier than they used as a house ball.
From: Bowling ball
Comment: Nice try, but its a spare.

Q: How long should I cook a hard boiled egg?
A: Studies done at the American Egg Board substantiate that unopened containers of brined or pickled eggs (marinated, hard-cooked eggs) keep for several months on the shelf.
From: Boiled egg
Comment: If the system could suggest three possible answers it might do better. For example, the boiled egg article does have this sentence, which would have been a nice answer: Boiled eggs are typically boiled for 10-11 minutes or until they float, depending on the size of the egg.

Q: What is the longest river in the world?
A: This is a list of the longest rivers on Earth. It includes river systems over 1,000 kilometers.
Comment: This might have been a good answer if it had not been truncated.

Q: What is the longest river in Africa
A: Nile.
From: Wikipedia:Nile
Comment: Good answer.

Q: What is the longest river in Africa ?
A: The length of a river between source and mouth may be hard to determine because of a lack of precise maps. In these cases, the measured length of a river will depend on the scale of the map on which the measurement is based; in general, due to the fractal quality of a river, the larger the scale, the longer the resulting length measurement.
From: List of rivers by length
Comment: adding the question mark changed the answer. I have no idea why.

Q: what is the longest river in Africa?
A: For example, although the Congo River appears to be a natural geographic boundary, there were groups that otherwise shared a language, culture or other similarity living on both sides. The division of the land between Belgium and France along the river isolated these groups from each other.
C: Yet another answer. Maybe it selects a random one from the top results.

Q: does P equal NP
A: A nurse practitioner is a registered nurse who has completed specific advanced nursing education (generally a master’s degree) and training in the diagnosis and management of common medical conditions. Nurse practitioners provide a broad range of health care services.
Comment: Maybe it’s a dodge.

Q: Who directed “They live by Night”?
A: Director: Nicholas Ray
Comment: That’s the right answer for a somewhat obscure film, which was Ray’s directing debut.

Q: Will there be peace in our lifetime?
A: No answer found for the question “Will there be peace in our lifetime?”
Comment: No comment.

Q: What is the value of pi?
A: Pi or Ï€ is the ratio of a circle’s circumference to its diameter in Euclidean geometry, approximately 3.14159.
Comment: correct, but an easy question.

Q: What is the square root of 16?
A: Year 16 was a leap year starting on Wednesday (link will display the full calendar) of the Julian calendar.
From: 16
Comment: this is a nice example that shows a fundamental weakness. AskWiki can only look for an answer that seems appropriate from among its several billion sentences. Even though that’s a large number, the number of ‘facts’ that people know is infinite.

Q: What is an emu
A: The Emu (pronunciation:), Dromaius novaehollandiae, is the largest bird native to Australia and the only extant member of the genus Dromaius. It is also the second-largest extant bird in the world by height, after its ratite relative, the ostrich. The soft-feathered, brown, flightless birds reach up to 2 m (6 ft) in height. The Emu is common over most of mainland Australia, although it avoids heavily populated areas, dense forest and arid areas. Emus can travel great distances at a fast, economical trot and, if necessary, can sprint at 50 km/h (30 mph) for some distance at a time.
From: Emu

Q: Can an emu fly?
A: The pelvic limb muscles of Emus have a similar contribution to total body mass as the flight muscles of flying birds.
From: Emu
Comment: While not a good answer, the sentence does at least imply that emus can not fly.

Q: How many popes have there been
A: Where there are sins, there are also divisions, schisms, heresies and disputes. Where there is virtue, however, there also are harmony and unity, from which arise the one heart and one soul of all believers (Cf CIC, can751.).
From: Pope
Comment: Inscrutable answer.

Q: Is there a god
A: Many historians of religion hold that monotheism may be of relatively recent historical origins – although comparison is difficult as many religions claim to be ancient.
Comment: Hmmmm…

Q: Where is Tanzania
A: Tanzania officially the United Republic of Tanzania (Swahili: Jamhuri ya Muungano wa Tanzania), is a country in East Africa bordered by Kenya and Uganda on the north, Rwanda, Burundi and the Democratic Republic of the Congo on the west, and Zambia, Malawi and Mozambique on the south. To the east it borders the Indian Ocean.
Comment: good answer

Q: Where is Brown University
A: Brown University is a private university located in Providence, Rhode Island. Founded in 1764 as the College of Rhode Island, it is the third-oldest institution of higher education in New England and the seventh-oldest in the United States. It is a member of the Ivy League. Pembroke College, Brown University’s all women’s college, merged with The College, in 1971.
From: Brown University
Comment: Also good.

Q: Why is the sky blue?
A: The light from the sky is a result of diffuse sky radiation which results in a light blue color being perceived by the human eye. On a sunny day the earth sky usually looks as a blue gradient — dark in the zenith, light near the horizon (due to Rayleigh scattering). It can turn a multitude of colors such as purple (especially near sunset and sunrise) and black at night. Scattering effects also partially polarize light from the sky.
From: Wikipedia:Sky
Comment: Not bad for this old chestnut.

Q: When was Charles Darwin born
A: Birth date: February 12, 1809
From: Charles Darwin
Comment: good!

Q: What is the capital of Belgium?
A: Capital: Brussels
From: Belgium
Comment: Right!

Q: What is the population of Brussels?
A: Population metro: 1,975,000
From: Brussels
Comment: Right!!

Q: what is the population of the capital of Belgium?
A: Population metro: 1,975,000
From: Brussels
Comment: I expected this one to fail!

Q: Is no your answer to this question?
A: This depends on the context, as do several other possibilities: Sometimes “I don’t know” is an acceptable answer, sometimes even a right answer. The same is true of “None of the above” and “There is no answer”. An answer is the, or a, right answer, if it presents true information which falls within the determined range of alternatives. Questions of this simplest sort usually begin with Who, what, which, where, when, does/do, is/are.
From: Question
Comment: Wow!

Q: is waterboarding torture?
A: Recent torture methods include waterboarding, sexual humiliation and sexual abuse, and the use of dogs against prisoners.
From: Torture
Comment: No moral confusion here!

Q: What is the airspeed velocity of an unladen swallow?
A: Monty Python and the Holy Grail is a 1975 film written and performed by the comedy group Monty Python (Graham Chapman, John Cleese, Terry Gilliam, Eric Idle, Terry Jones, and Michael Palin), and directed by Gilliam and Jones. It was made during a gap between the third and fourth seasons of their popular BBC television series Monty Python’s Flying Circus.
From: Wikipedia:Monty Python and the Holy Grail
Comment: Maybe the selection process is done in two steps: find a relevant article and then find a good answer within it.

Q: Can a computer think?
A: However, computers cannot “think” for themselves in the sense that they only solve problems in exactly the way they are programmed to.
Comment: This answer must have been painful for poor old AskWiki to give.

Q: who is jimbo wales?
A: Jimmy Donal “Jimbo” Wales (born 7 August 1966 in Huntsville, Alabama) is an American Internet entrepreneur known for his role in founding Wikipedia and other wiki-related
Comment: Know your master.

Q: Is jimbo wales a genius or what?
A: No answer found for the question “Is jimbo wales a genius or what?”
Comment: Ahhh, AskiWiki may be showing some tact.

Blogrunner: the New York Times robot in the newsroom

November 1st, 2007

The New York Times has incorporated Blogrunner into it’s Web site. Techcrunch characterizes Blogrunner as a Techmeme Killer

“Last night, the New York Times quietly launched Blogrunner on the technology section of its main site. Blogrunner was one of many Techmeme copycat sites, until the New York Times bought it last year. Like Techmeme, Blogrunner is a service that keeps track of the latest news and blog posts on a range of topics (Politics, Technology, Media, Business, Economy, Law, Health, Movies, Books, Religion, Iraq, Entertainment). Now those links are appearing on the New York Time’s main site, starting with the technology section, in a middle column titled “Technology Headlines from Around the Web.” (link)

Here’s the NYT Bits blog on Blogrunner:

“The biggest change is the feature in the middle column of the technology page titled “Technology Headlines From Around the Web.” It presents a constantly updated list of hot technology stories. Notice what we are not worried about. … Even more interesting to me is how this list gets generated. It is mainly created by an automated algorithm developed by Philippe Lourier, the developer of Blogrunner, a Web site The New York Times Co. bought last year. It has something in common with Digg, the site on which readers vote on what articles they find interesting. But for Blogrunner, votes are links from blogs or other Web sites. This approach, of course, is what powers the PageRank algorithm of Google, and Techmeme, an excellent technology news site. (link)

I wonder what is taught at J Schools about this these days.