Google recipe search exploits semantic web data in RDFa

February 26th, 2011, by Tim Finin, posted in AI, Google, Semantic Web

Many people now use the Web to find recipes rather than their own collection of cookbooks and it is estimated that about one percent of all Google searches are for recipes. This past Thursday, Google released Recipe View in the US, letting you limit results to pages that are recipes and further narrow your search by ingredients, cooking time and calories. This feature is powered by semantic metadata encoded in RDFa and other formats

Google describes the new recipe search in a post on the Official Google Blog:

“Recipe View lets you narrow your search results to show only recipes, and helps you choose the right recipe amongst the search results by showing clearly marked ratings, ingredients and pictures. To get to Recipe View, click on the “Recipes” link in the left-hand panel when searching for a recipe. You can search for specific recipes like [chocolate chip cookies], or more open-ended topics—like [strawberry] to find recipes that feature strawberries, or even a holiday or event, like [cinco de mayo]. In fact, you can try searching for all kinds of things and still find interesting results: a favorite chef like [ina garten], something very specific like [spicy vegetarian curry with coconut and tofu] or even something obscure like [strange salad].”

Recipe View extracts data embedded in Web pages that is encoded in Google’s rich snippets format. This includes both the W3C Semantic Web standard RDFa as well as microformats. Google recognizes a simple recipe vocabulary with fourteen properties.

This is a great example of the potential of semantic web technology that can be understood and appreciated by anyone with an interest in cooking. Or eating.

CloudCamp Baltimore, 6-10pm Wed Mar 9, 2011

February 24th, 2011, by Tim Finin, posted in cloud computing, High performance computing

There will be a free CloudCamp meeting in Baltimore from 6:000pm to 10:00pm Wednesday March 9th at the Baltimore Marriott Waterfront. Cloudcamps are participants-driven unconferences where users of Cloud Computing technologies meet to network and share ideas, experiences, challenges and solutions. The event is free but participants are asked to register to ensure there is enough food and refreshments.

CloudCampHere is the current, tentative schedule:

6:00pm – Registration & Networking (food/drink)
6:30pm – Opening Introductions
6:45pm – Lightning Talks (5 minutes each)
7:30pm – Unpanel
8:00pm – Organize Unconference
8:15pm – Unconference Breakout Session Round 1
9:00pm – Unconference Breakout Session Round 2
9:45pm – Wrap-up
10:00pm – Find somewhere for post-event networking

Contact the organizers if you are interested in giving a five minute lightning talk or lead breakout session.

ICWSM 2011 Data Challenge with 3TB of social media data

February 23rd, 2011, by Tim Finin, posted in Datamining, NLP, Semantic Web, Social media

The Fifth International AAAI Conference on Weblogs and Social Media is holding a new data challenge using a new dataset from that includes about three TB of social media data collected by Spinn3r between January 13 and February 14th, 2011.

The dataset consists of over 386M blog posts, news articles, classifieds, forum posts and social media content in a month including events such as the Tunisian revolution and the Egyptian protests. The content includes the syndicated text, its original HTML as found on the web, annotations and metadata (e.g., author information, time of publication and source URL), and boilerplate/chrome extracted content. The data is formatted as Spinn3r’s protostreams – an extension to Google protobuffers. It is also broken down by date, content type and language making it easy to work with selected data.

See the ICWSM Data Challenge pages for more information on the challenge task, its associated ICWSM workshop and procedures for data access.

Did Watson enjoy a head start on Jeopardy?

February 22nd, 2011, by Tim Finin, posted in AI, Machine Learning, Semantic Web

IBM's Watson on Jeopardy!

IBM’s Watson’s performance in last week’s Jeopardy Challenge was an amazing accomplishment and a demonstration of how our computer systems are becoming more intelligent and capable of solving difficult tasks.

But I wonder if the way that questions were given to the human players and Watson doesn’t give Watson a short, but significant head start. According to the New York Times

“During the sparring matches, Watson received the questions as electronic texts at the same moment they were made visible to the human players;”

Once Watson received a query, it could process it immediately. While the human contestants got to see the query as written text at the same time, Alex Trebek also starts reading the question aloud. When I was watching Jeopardy, I found it almost impossible to read and understand the question more quickly than it was being spoken and suspect that Ken Jennings and Brad Rutter might also. It’s often observed that people find it very difficult to simultaneously process two language streams. While it took Trebek only a second or two to read the short Jeopardy queries, that could have given Watson a significant head start, enabling it to determine that it had a good answer and press its buzzer before the competition.

If this is the case, I am not sure if it is an unfair advantage. People and computers each have native advantages and disadvantages. If Jennings and Rutter got the questions as text without them being simultaneous read aloud, Watson might still have had the advantage of a quicker start.

Computer Science publication culture

February 14th, 2011, by Tim Finin, posted in Computing Research, CS, Semantic Web

There has been an ongoing discussion on the publication culture with the computer science research community in CACM, carried out through a series of editorials, opinion pieces, articles and letters. It covers the usual topics — the best role of workshops, conferences and journals, reviewer responsibility, the effect of deadlines on publications, etc. All important issues.

Jonathan Grudin has an opinion piece in the current (Feb) CACM

Technology, conferences, and community. J. Grudin, 2011. Comm. of the ACM, 54, 2, 41-43.

He has also made available a list of the 16 recent CACM articles (with links) on the topic. It’s a list of papers worth reading.

Six lessons for the age of machines

February 13th, 2011, by Tim Finin, posted in AI, Datamining, Machine Learning, NLP, Semantic Web

On the eve of the big Jeopardy! match, Peter Norvig’s opinion piece in the New York Post (!) today, The Machine Age looks at AI’s progress over the past sixty years and lays out six surprising lessons we’ve learned.

  • The things we thought were hard turned out to be easier.
  • Dealing with uncertainty turned out to be more important than thinking with logical precision.
  • Learning turned out to be more important than knowing.
  • Current systems are more likely to be built from examples than from logical rules.
  • The focus shifted from replacing humans to augmenting them.
  • The partnership between human and machine is stronger than either one alone.

When took Pat Winston’s undergraduate AI class in 1970, only the first of those ideas was current. It’s a good essay.

Of course, after we we’ve exploited the new data-driven, statistical paradigm for the next decade or so, we’ll probably have to go back to figuring out how to get logic back into the framework.

Science on Dealing with Data

February 12th, 2011, by Tim Finin, posted in Machine Learning, Semantic Web, Social media

The current (11 February 2011) issue of Science is a special issue on Dealing with Data. It includes a collection of free, online articles that “highlights both the challenges posed by the data deluge and the opportunities that can be realized if we can better organize and access the data.” Some of the articles are drawn from three sister publications: Science Signaling, Science Translational Medicine and Science Careers.

From the issue’s introduction:

Special issue of Science on Dealing with Data

“Scientific innovation has been called on to spur economic recovery; science and technology are essential to improving public health and welfare and to inform sustainability; and the scientific community has been criticized for not being sufficiently accountable and transparent. Data collection, curation, and access are central to all of these issues.

As you will discover, two themes appear repeatedly: Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data.”

One of the great things about the “data deluge” is that there is something in it for almost all computer science researchers including areas like machine learning, data mining, NLP, visualization, semantic web, security and privacy, social media, high performance computing, HCI, etc. Here are some of the articles that caught our eye:

and still more that look very interesting:

Data Citation, Peer Review and Provenance

February 8th, 2011, by Tim Finin, posted in Semantic Web

In today’s ebiquity meeting, Curt Tilmes showed an interesting figure showing the how often a particular dataset (MODIS snow cover data) was mentioned in a paper vs. how often it was formally cited. It’s a good example of how far we still need to go w.r.t. formally capturing the provenance of data and information derived from it.

Data Citation and Peer Review

The figure is from:

Parsons, Mark A.; Duerr, Ruth; Minster, Jean-Bernard. Data Citation and Peer Review. Eos, Transactions American Geophysical Union, Volume 91, Issue 34, p. 297-298. 2010.

Maryland Cyber Challenge and Conference

February 7th, 2011, by Tim Finin, posted in cybersecurity, UMBC

UMBC, SAIC, the National Cyber Security Alliance, the Tech Council of Maryland, and the Maryland Department of Business and Economic Development have joined to hold the Maryland Cyber Challenge and Conference on October 21-22, 2001. The event is designed to increase cyber awareness as a career choice in Maryland, improve the appreciation for cyber oriented curriculum in college and high schools, and convey cyber defense as a sport to increase interest in careers involving cyber security.

The competition will be divided into high school, collegiate and professional divisions. Qualifying rounds take place over the Internet between April and August 2011 using SAIC's Cyber Network Exercise System (CyberNEXS), a scalable training, exercise and certification system.  The top eight teams in each division will meet at the MDC3 event in October for the final round followed by an award ceremony at UMBC. MDC3 participants will also be able to learn from and network with other cybersecurity professionals, researchers, and scholars at the conference, which will include presentations, a career fair and a vendor exhibition.

For more information see this press release and the SAIC MDC3 site.

The State of Cyber Security in 2011

February 6th, 2011, by Tim Finin, posted in cybersecurity, Security

Charles Croom Charles Croom, of Lockheed Martin will talk about "The State of Cyber Security 2011" at the UMBC Visionaries in IT Forum at 8:00am on Wednesday, February 23rd at the BWI Airport Marriott. The event is free but registration requested.

Croom joined Lockheed Martin Information Systems & Global Solutions as Vice President of Cyber Security Solutions in October of 2008. In this capacity, he shapes the corporation’s cyber security strategy with insight from his 35 years of distinguished service, leadership, and technology experience from the U.S. Air Force. He co-chaired a National Security Telecommunications Advisory Committee Task Force on “Strengthening Government and Private Sector Collaboration” which issued a May 2009 report recommending that the President direct the establishment of a Joint Coordinating Center. He currently serves on the Boards of the National Cyber Security Alliance (NCSA) and the Internet Security Alliance (ISA).

Croom retired as a U.S. Air Force Lieutenant General, Director of the Defense Information Systems Agency (DISA), and the Commander of the Joint Task Force for Global Network Operations in September 2008. While at DISA, he led a worldwide organization of more than 6,600 military and civilian personnel to serve the information technology and telecommunications needs of the President, Secretary of Defense, Joint Chiefs of Staff, combatant commanders, and other Department of Defense stakeholders.

Computer Science lecturer position available at UMBC

February 5th, 2011, by Tim Finin, posted in UMBC

The UMBC CSEE Department invites applications for a non-tenure track, full-time lecturer position to teach a variety of undergraduate computer science courses. Both a demonstrated ability to teach such courses and a strong interest in teaching undergraduates are essential. Applicants must have, or be about to receive, an M.S. or Ph.D. degree in Computer Science or a related discipline. Applications should be submitted by 15 March 2011 and the position will start on 23 August 2011.

UMBC to host Maryland FIRST Lego League championship

February 2nd, 2011, by Tim Finin, posted in AI, UMBC

Lego RobotUMBC will host the 2011 FIRST Lego League Maryland State Championship on Saturday February 26 in the UMBC Retriever Activities Center.

FIRST Lego League (FLL) an international competition for elementary and middle school students that is run by the FIRST organization with support by Lego. FLL teams use Lego Mindstorms kits to build small autonomous robots built with a limited number of sensors and motors that complete to perform predefined challenge given tasks.

"Guided by adult mentors and their own imaginations, FLL students solve real-world engineering challenges, develop important life skills, and learn to make positive contributions to society. FLL provides students age 9-14 with an opportunity to challenge their math and science skills in an internationally recognized competitive environment. FLL combines a hands-on, interactive robotics program with a sports-like atmosphere. Teams of up to 10 players focus on team building, problem solving, creativity, and analytical thinking to develop a well thought out solution to a problem currently facing the world – the Challenge."

The UMBC organizers, led by UMBC Mechanical Engineering Professor Anne Spence, need volunteers from the UMBC community to help on the tournament day as well as to help set up in on Friday. If you are interested in helping please register online. Volunteering to help in the Maryland FLL championship is a great way to help engage young people in science and technology and have some fun doing it.

