 | Tim Finin 
Author Archive
June 12th, 2011, by Tim Finin, posted in AI

Back in May, it was reported that a robot explorer sent through the Great Pyramid of Giza discovered mysterious hieroglyphs in the 4,500-year-old mausoleum behind one of its mysterious doors. The images transmitted by the robot showed hieroglyphs written in red paint that had not been seen by human eyes since the construction of the pyramid.
This week, the reports are that the three red ochre figures painted on the floor of a hidden chamber at the end of a tunnel deep inside the pyramid are just numbers. The builders of the pyramid simply recorded the total length of the southern shaft from the Queen’s Chamber: 121 cubits.
While not exactly graffiti, it reminds me that when I’ve worked on an older house, I’ve often found notes left by the original workers who built it, like sketches with dimensions on the plaster covered up by wallpaper.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
June 3rd, 2011, by Tim Finin, posted in Semantic Web
The Semantic Web community is still unsure what to think of the microdata.
The schema.rdfs.org provides static RDFS documents of the schema.org terms in RDF serialized in turtle, XML and ntriples as well as in JSON.
Mike Bergman argues that the microdata effort will also boost RDF.
Yahoo!’s Peter Mika is still a RDFa fan, but also has a pragmatic appreciation for the agreement of the big three search companies on a standard for semantic data.
“Given the above history, I’m extremely glad that cooperation prevailed in the end and hopefully schema.org will become a central point for vocabularies for the Semantic Web for a long time to come. Note that it will almost certainly not be the only one. schema.org covers the core interests of search providers, i.e. the stuff that people search for the most (hence the somewhat awkward term ‘search vocabularies’). As the simple needs are the most common in search logs, this includes things like addresses of businesses, reviews and recipes. schema.org will hopefully evolve with extensions over time but it may never cover complex domains such as biotechnology, e-government or others where people have been using Semantic Web technology with success.”
Edit | Bookmark@del.icio.us | Trackback | 4 Comments »
June 3rd, 2011, by Tim Finin, posted in Semantic Web
The submission deadline for OGK2011 has been extended to 17 June 2011.
AAAI 2011 Fall Symposium
Open Government Knowledge: AI Opportunities and Challenges
4-6 November 2011 • Arlington, Virginia USA
http://tw.rpi.edu/ogk2011
The 2011 AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges seeks papers on all aspects of publishing public government data as reusable knowledge on the Web. Both long papers presenting research results and shorter papers describing late breaking work, outlining implemented systems, identifying new research challenges, or articulating a position are invited. Submissions are due by June 17, notifications will be sent by July 15, and the final camera-ready copy must be provided by September 9, 2011.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
June 2nd, 2011, by Tim Finin, posted in RDF, sEARCH, Semantic Web, Web
Google, Bing and Yahoo! are cooperating on an approach to representing structured data in Web pages via the launch of schema.org. The approach is microdata and the schema.org site documents the schemas that are supported today.
“This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.”
That’s the good news. The bad news, or at least the less good news, is that it based on microdata and not RDFa. Microdata is a relatively new way to embed semantic information in HTML and designed to be part of the HTML5 suite. It is less expressive than RDFa but also simpler. It’s main advantage over microformats is that it is extensible — you can define new semantic vocabulary terms. Here is how the three companies described the choice.
Google: “Historically, we’ve supported three different standards for structured data markup: microdata, microformats, and RDFa. We’ve decided to focus on just one format for schema.org to create a simpler story for webmasters and to improve consistency across search engines relying on the data.”
Yahoo!:“Today’s announcement offers tremendous opportunity for growth. In addition to consolidating the schemas for the vocabularies we already support, there are schemas for more than a hundred newly created categories including movies, music, organizations, TV shows, products, places and more. We will continue to expand these categories by listening to feedback from the community and will continue publishing new schemas on a regular basis. Don’t worry if your site has already added RDFa or microformats currently supported by our Enhanced Displays program, that site will still appear with an Enhanced Display on Yahoo! – no changes required.”
Bing:“At Bing we understand the significant investment required to implement markup, and feel strongly that by partnering with Google and Yahoo! on standard schemas webmasters can be more efficient with the time they invest… Bing accepts a wide variety of markup formats today (Open Graph, microformat, etc.) for features like Tiles and will continue to do so, but by standardizing on schema.org we are looking to simplify the markup choices for webmasters and amplify the value the receive in return.
The scheme.org site has a FAQ that includes the question “Q: Why microdata? Why not RDFa or microformats?” which is answered thusly:
“Focusing on microdata was a pragmatic decision. Supporting multiple syntaxes makes documentation for webmasters more complex and introduces more overhead in terms of defining new formats. Microformats are concise and easy to understand, but they don’t offer an open extensibility mechanism and the reuse of the class tag can cause conflicts with website CSS. RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption. Microdata is the most recent well-known standard, created along with HTML5. It strikes a balance between extensibility and simplicity, and is most suitable for building the schema.org. Google and Yahoo! have in the past supported both microformats and RDFa for certain schemas and will continue to support these syntaxes for those schemas. We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes. Also read the section on the data model for more on RDFa.”
Guha has a generous comment in his post on the official Google blog:
“While this collaborative initiative is new, we draw heavily from the decades of work in the database and knowledge representation communities, from projects such as Jim Gray’s SDSS Skyserver, Cyc and from ongoing efforts such as dbpedia.org and linked data. We feel privileged to build upon this great work. We look forward to seeing structured markup continue to grow on the web, powering richer search results and new kinds of applications.”
I’ve not studied microdata yet, so don’t know how I feel about the expressiveness/simplicity tradeoffs it has made. I wonder if it is possible to add an OWL-like layer on top ofMicrodata, for example.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
May 15th, 2011, by Tim Finin, posted in AI, Semantic Web
The 2011 AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges (OGK2011) seeks papers on all aspects of publishing public government data as reusable knowledge on the Web. Both long papers presenting research results and shorter papers describing late breaking work, outlining implemented systems, identifying new research challenges, or articulating a position are invited. Submissions are due by June 3, notifications will be sent by July 15, and the final camera-ready copy must be provided by September 9 for the November 4-6 workshop.
Relevant topics include the automatic and semi-automatic creation of linked data resources, ontologies for government data, entity linking and co-reference detection between linked data resources, adding temporal qualifications to government data, creating mash-ups with open government data, linked open government data analysis, metadata for provenance, certainty and trust, policies for information sharing, privacy and use, social networks and government data, machine learning applied to government data, data visualization techniques, and applications. The symposium organizers are Li Ding (RPI), Tim Finin (UMBC), Lalana Kagal (MIT) and Deborah McGuinness (RPI). Program committee members and additional information are listed on the OGK2011 symposium site.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
May 11th, 2011, by Tim Finin, posted in Agents, AI, Google
A story in yesterday’s NYT, Google Lobbies Nevada To Allow Self-Driving Cars, reports that Google has hired a Nevada lobbyist to promote two bills related to autonomous vehicles that are expected to be voted on this summer.
“Google hired David Goldwater, a lobbyist based in Las Vegas, to promote the two measures, which are expected to come to a vote before the Legislature’s session ends in June. One is an amendment to an electric-vehicle bill providing for the licensing and testing of autonomous vehicles, and the other is the exemption that would permit texting.”
Arguments the lobbyist offered included that “the autonomous technology would be safer than human drivers, offer more fuel-efficient cars and promote economic development.”
I’d add that the Google Bot has a clean driving record, exhibits an excellent sense of direction, will obey any laws inserted into a state’s robots.txt, and does not drink. However, the Google Bot’s current cars are all Toyotas and an Audis. Maybe the Nevada legislator should find a way to encourage it to support the US auto industry and buy some American cars.
I liked project leader Sebastian Thrun’s example of a potential benefit of autonomous vehicles.
“In frequent public statements, he has said robotic vehicles would increase energy efficiency while reducing road injuries and deaths. And he has called for sophisticated systems for car sharing that, he says, could cut the number of cars in the United States in half. “What if I could take out my phone and say, ‘Zipcar, come here,’ ” he asked an industry conference last year, “and a moment later the Zipcar came around the corner?””
Edit | Bookmark@del.icio.us | Trackback | Comments Off
April 12th, 2011, by Tim Finin, posted in AI, Semantic Web
The new Journal of Web Semantics preprint server is now online. Final drafts of accepted papers will be added to the preprint server as papers are accepted for publication, making a preprint available as soon as possible.
We are loading papers from back issues into the preprint server as time permits. The preprint server is based on the Open Journal Systems software and hosted by Gesis, the Leibniz Institute for the Social Sciences.
After drafts are on the preprint server, they enter Elsevier’s production pipeline in which they are professionally copy edited, formatted for the journal, and proofed by the authors. The result is assigned a DOI and put online as a JWS article in press available to to individual and institutional subscribers. When the article is assigned to an issue and printed, the final copy will be available online to subscribers in Elsevier’s Science Direct system.
We would like to thank the people who helped stand up the new preprint server, including Ute Koch of Gesis, Kaixuan Wang of the University of Manchester, and Silke Werger of the University of Koblenz and Landau.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
April 6th, 2011, by Tim Finin, posted in Machine Learning
Publishing trends has a good post describing a new variation on spam: creating low-quality ebooks from plagiarized or public-domain content and selling them in ebook markets like Amazon’s Kindle store. If you want to MAKE.MONEY.FAST there are people willing to help:
Automatically detecting these spam ebooks might be a good machine learning project. One problem is that to use features of the ebook itself (e.g., poor formatting) might require purchasing it. But there are sure to be many useful features that the ebook store provides that might support an effective classifier.
(h/t Bruce Schneier)
Edit | Bookmark@del.icio.us | Trackback | Comments Off
April 5th, 2011, by Tim Finin, posted in AI, GAIM, Machine Learning
DARPA is developing a new component to track “quiet submarines” to be part of the Navy’s Anti Submarine Warfare toolkit and is using a software game to collect effective strategies for its use.
“Before autonomous software is developed for ACTUV’s computers, DARPA needs to determine what approaches and methods are most effective. To gather information from a broad spectrum of users, ACTUV has been integrated into the Dangerous Waters™ game. DARPA is offering this new ACTUV Tactics Simulator for free public download.
This software has been written to simulate actual evasion techniques used by submarines, challenging each player to track them successfully. Your tracking vessel is not the only ship at sea, so you’ll need to safely navigate among commercial shipping traffic as you attempt to track the submarine, whose driver has some tricks up his sleeve. You will earn points as you complete mission objectives, and will have the opportunity to see how you rank against the competition on DARPA’s leaderboard page. You can also share your experiences and insights from playing the simulator with others.”
This is a kind of crowdsourcing — leveraging the experiences of a large number of people playing a game. Applying various kinds of machine learning algorithms to the simulator data could be an effective way to train an autonomous tool for this task.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 29th, 2011, by Tim Finin, posted in AI, Semantic Web, Web
The 2011 AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges (OGK2011) seeks papers on all aspects of publishing public government data as reusable knowledge on the Web. Both long papers presenting research results and shorter papers describing late breaking work, outlining implemented systems, identifying new research challenges, or articulating a position are invited. Submissions are due by June 3, notifications will be sent by July 15, and the final camera-ready copy must be provided by September 9.
Websites like data.gov, research.gov and USASpending.gov aim to improve government transparency, increase accountability, and encourage public participation by publishing public government data online. Although this data has been used for some intriguing applications, it is difficult for citizens to understand and use. This symposium will explore how AI technologies such as the Semantic Web, information extraction, statistical analysis and machine learning can be used to make the knowledge embedded in the data more explicit, accessible and reusable. The symposium’s location of Washington, DC will facilitate the participation of U.S. federal government agency members and enable interchange between researchers and practitioners. We also expect attendance of international open government data players from e.g. UK and Australia.
Relevant topics include the automatic and semi-automatic creation of linked data resources, ontologies for government data, entity linking and co-reference detection between linked data resources, adding temporal qualifications to government data, creating mash-ups with open government data, linked open government data analysis, metadata for provenance, certainty and trust, policies for information sharing, privacy and use, social networks and government data, machine learning applied to government data, data visualization techniques, and applications.
This symposium will include a mix of invited talks, paper presentations, panels, system demonstrations, a poster session, and discussions. We plan to have several invited speakers drawn from government, academia and industry. We will run panels on the emerging challenges and best practices, including (i) how to enhance transparency and interoperability within an agency and across different agencies/countries, and (ii) how to promote nationwide health information network that effectively integrates government-curated public records and citizens’ personal health data.
The symposium organizers are Li Ding (RPI), Tim Finin (UMBC), Lalana Kagal (MIT) and Deborah McGuinness (RPI). Program committee members and additional information are listed on the OGK2011 symposium site. For more information about the the symposium, send email inquiries to ogk11-info@googlegroups.com.
Important Dates
- Workshop: 4-6 November 2011 in Arlington, Virginia USA
- Submissions due: 3 June 2011
- Decisions by: 15 July 15 2011
- Camera ready by: 9 September 2011
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
March 15th, 2011, by Tim Finin, posted in Social media, Twitter

Twitter reports that its users are sent an average of 140M tweets a day last month. That adds up to a billion a week, in round numbers. Another impressive statistic their post cites is that last month saw an average of 460K new Twitter accounts per day. Both numbers are very impressive.
Liz Gannes comments on the fact that Twitter does not report on the total number of users it has or how many of these are active. The number of users is thought to be over 200M, but I recall data that is now over a year old estimating that 40% of the users have made no tweets and 80% have made fewer that 10 tweets. Maybe the bulk of those 460K new users a day are signing up to follow @charliesheen.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
March 14th, 2011, by Tim Finin, posted in Agents, AI, KR, Mobile Computing, Pervasive Computing, Semantic Web

Pervasive, context-aware computing technologies can significantly enhance and improve the coming generation of devices and applications for consumer electronics as well as devices for work places, schools and hospitals. Context-aware cognitive support requires activity and context information to be captured, reasoned with and shared across devices — efficiently, securely, adhering to privacy policies, and with multidevice interoperability.
The AAAI-11 conference will host a two-day workshop on Activity Context Representation: Techniques and Languages focused on techniques and systems to allow mobile devices model and recognize the activities and context of people and groups and then exploit those models to provide better services. The workshop will be held on August 7th and 8th in San Francisco as part of AAAI-11, the Twenty-Fifth Conference on Artificial Intelligence. Submission of research papers and position statements are due by 22 April 2011.
The workshop intends to lay the groundwork for techniques to represent context within activity models using a synthesis of HCI/CSCW and AI approaches to reduce demands on people, such as the cognitive load inherent in activity/context switching, and enhancing human and device performance. It will explore activity and context modeling issues of capture, representation, standardization and interoperability for creating context-aware and activity-based assistive cognition tools with topics including, but not limited to the following:
- Activity modeling, representation, detection
- Context representation within activities
- Semantic activity reasoning, search
- Security and privacy
- Information integration from multiple sources, ontologies
- Context capture
There are three intended end results of the workshop: (1) Develop two-three key themes for research with specific opportunities for collaborative work. (2) Create a core research group forming an international academic and industrial consortium to significantly augment existing standards/drafts/proposals and create fresh initiatives to enable capture, transfer, and recall of activity context across multiple devices and platforms used by people individually and collectively. (3) Review and revise an initial draft of structure of an activity context exchange language (ACEL) including identification of use cases, domain-specific instantiations needed, and drafts of initial reasoning schemes and algorithms.
For more information, see the workshop call for papers.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
|  |
|  |