Open Government Knowledge: AI Opportunities and Challenges (OGK2011)

March 29th, 2011

The 2011 AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges (OGK2011) seeks papers on all aspects of publishing public government data as reusable knowledge on the Web. Both long papers presenting research results and shorter papers describing late breaking work, outlining implemented systems, identifying new research challenges, or articulating a position are invited. Submissions are due by June 3, notifications will be sent by July 15, and the final camera-ready copy must be provided by September 9.

Websites like, and aim to improve government transparency, increase accountability, and encourage public participation by publishing public government data online. Although this data has been used for some intriguing applications, it is difficult for citizens to understand and use. This symposium will explore how AI technologies such as the Semantic Web, information extraction, statistical analysis and machine learning can be used to make the knowledge embedded in the data more explicit, accessible and reusable. The symposium’s location of Washington, DC will facilitate the participation of U.S. federal government agency members and enable interchange between researchers and practitioners. We also expect attendance of international open government data players from e.g. UK and Australia.

Relevant topics include the automatic and semi-automatic creation of linked data resources, ontologies for government data, entity linking and co-reference detection between linked data resources, adding temporal qualifications to government data, creating mash-ups with open government data, linked open government data analysis, metadata for provenance, certainty and trust, policies for information sharing, privacy and use, social networks and government data, machine learning applied to government data, data visualization techniques, and applications.

This symposium will include a mix of invited talks, paper presentations, panels, system demonstrations, a poster session, and discussions. We plan to have several invited speakers drawn from government, academia and industry. We will run panels on the emerging challenges and best practices, including (i) how to enhance transparency and interoperability within an agency and across different agencies/countries, and (ii) how to promote nationwide health information network that effectively integrates government-curated public records and citizens’ personal health data.

The symposium organizers are Li Ding (RPI), Tim Finin (UMBC), Lalana Kagal (MIT) and Deborah McGuinness (RPI). Program committee members and additional information are listed on the OGK2011 symposium site. For more information about the the symposium, send email inquiries to

Important Dates

  • Workshop: 4-6 November 2011 in Arlington, Virginia USA
  • Submissions due: 3 June 2011
  • Decisions by: 15 July 15 2011
  • Camera ready by: 9 September 2011

Microsoft Speller Challenge

March 15th, 2011

Microsoft Research and Bing are jointly hosting the Speller Challenge. The goal is to build the best service that could propose alternative spellings for search queries submitted to Bing. Entries must be submitted for the challenge in the form of a REST-based web service, and they will be judged based on their expected F1 score against a test set sampled from real Bing queries.

For development purposes, they are making available a TREC evaluation dataset through their Web-NGram service.  Refer to this page for detailed evaluation measures and REST service specs.

Time to start implementing!

Twitter at one billion tweets a week

March 15th, 2011

Twitter at one billion tweets a week

Twitter reports that its users are sent an average of 140M tweets a day last month. That adds up to a billion a week, in round numbers. Another impressive statistic their post cites is that last month saw an average of 460K new Twitter accounts per day. Both numbers are very impressive.

Liz Gannes comments on the fact that Twitter does not report on the total number of users it has or how many of these are active. The number of users is thought to be over 200M, but I recall data that is now over a year old estimating that 40% of the users have made no tweets and 80% have made fewer that 10 tweets. Maybe the bulk of those 460K new users a day are signing up to follow @charliesheen.

AAAI-11 Workshop on Activity Context Representation: Techniques and Languages

March 14th, 2011

Mobile devices and provide better services if then can model, recognize and adapt to their users' context.

Pervasive, context-aware computing technologies can significantly enhance and improve the coming generation of devices and applications for consumer electronics as well as devices for work places, schools and hospitals. Context-aware cognitive support requires activity and context information to be captured, reasoned with and shared across devices — efficiently, securely, adhering to privacy policies, and with multidevice interoperability.

The AAAI-11 conference will host a two-day workshop on Activity Context Representation: Techniques and Languages focused on techniques and systems to allow mobile devices model and recognize the activities and context of people and groups and then exploit those models to provide better services. The workshop will be held on August 7th and 8th in San Francisco as part of AAAI-11, the Twenty-Fifth Conference on Artificial Intelligence. Submission of research papers and position statements are due by 22 April 2011.

The workshop intends to lay the groundwork for techniques to represent context within activity models using a synthesis of HCI/CSCW and AI approaches to reduce demands on people, such as the cognitive load inherent in activity/context switching, and enhancing human and device performance. It will explore activity and context modeling issues of capture, representation, standardization and interoperability for creating context-aware and activity-based assistive cognition tools with topics including, but not limited to the following:

  • Activity modeling, representation, detection
  • Context representation within activities
  • Semantic activity reasoning, search
  • Security and privacy
  • Information integration from multiple sources, ontologies
  • Context capture

There are three intended end results of the workshop: (1) Develop two-three key themes for research with specific opportunities for collaborative work. (2) Create a core research group forming an international academic and industrial consortium to significantly augment existing standards/drafts/proposals and create fresh initiatives to enable capture, transfer, and recall of activity context across multiple devices and platforms used by people individually and collectively. (3) Review and revise an initial draft of structure of an activity context exchange language (ACEL) including identification of use cases, domain-specific instantiations needed, and drafts of initial reasoning schemes and algorithms.

For more information, see the workshop call for papers.

Twitter changes TOS;might hurt researchers

March 7th, 2011

ReadWriteWeb reports that Twitter recently made changes in its Terms of Service. Specifically, Twitter will no longer grant any more requests for whitelisting and it would no longer allow redistribution of its content either for commercial or non-commercial purposes. Twitter whitelisting was a way of allowing developers or researchers to access large quantities of data via the REST api. Although Twitter will honor already “whitelisted developers”, it will not grant any further requests.

The second change in the Terms of Service is with respect to redistribution of content.  This means any one who is gathering twitter data whether a developer or researcher can no longer share it with others even if it is for academic or non-commercial purposes. As ReadWriteWeb points out these changes will most likely hurt researchers who are dependent on third party organizations to provide data for their research.

As part of the new Twitter terms of service, 140kit like other organizations can no longer offer exports of Twitter data for any purposes – whether that’s for profit or non-profit, whether that’s for developers or scholars. You could be writing the next killer app. Or you could be working on the final chapter of your PhD dissertation. (And let me interject right here and say that having your access to research data shut down as a PhD student is beyond devastating.) It doesn’t matter. Exporting Tweets now violates the TOS.

It looks like Twitter just made it difficult for researchers to access data for their research.

Journal of Web Semantics special issues on context and mobility

March 6th, 2011

The Journal of Web Semantics has announced two new special issues to be published in 2010.

An issue on Reasoning with context in the Semantic Web seeks papers by June 15, 2011 and will be published in the Spring of 2012. The special issue will be edited by Alan Bundy and Jos Lehmann of the University of Edinburgh and Ivan Varzinczak of the Meraka Institute.

An issue on The Semantic Web in a Mobile World will accept submission until October 1, 2011 and will be published in September 2012. The special issue will be edited by Ansgar Scherp of the University of Koblenz-Landau and Anupam Joshi of the University of Maryland, Baltimore County.

Free linked data book by Tom Heath and Chris Bizer

March 2nd, 2011

Congratulations to Tom Heath and Christian Bizer on the publication of their new book, Linked Data: Evolving the Web into a Global Data Space. It’s published by Morgan & Claypool in the series Synthesis Lectures on the Semantic Web: Theory and Technology edited by Jim Hendler and Frank van Harmelen.
Linked Data: Evolving the Web into a Global Data Space

“This book provides a conceptual and technical introduction to the field of Linked Data. It is intended for anyone who cares about data – using it, managing it, sharing it, interacting with it – and is passionate about the Web. We think this will include data geeks, managers and owners of data sets, system implementors and Web developers. We hope that students and teachers of information management and computer science will find the book a suitable reference point for courses that explore topics in Web development and data management. Established practitioners of Linked Data will find in this book a distillation of much of their knowledge and experience, and a reference work that can bring this to all those who follow in their footsteps.”

More importantly, we should all thank them and Morgan & Claypool for making a free HTML version available on the Web.