FieldMarking: creating the global human sensor net

November 17th, 2005

We’ve been conducting a pilot study at towards creating a Global Human Sensor Net: people all over the world collaboratively reporting, tagging, and thus exchanging information about their observations of the natural world. Such information is already piling up in casual text in blogs and discussion forums, but it is not very accessible to scientists there.

A variety of efforts are underway to address this general problem of how to share unstructured information: simple tagging, microformats, datablogging, structured blogging, and semantic web browsers.

The FieldMarking concept is to let people freely report what they see in unstructured text, but to provide them with appropriate data fields to structure or annotate their own — or somebody else’s — observations. To use text scrapers and existing ontologies to provide suggestions for appropriate markup. To publish the structured data in RDF so it can be intelligently retrieved and aggregated so that scientists can be alerted, for example, to invasive species or emerging diseases. Interactive graphing tools would allow both citizens and scientists to visually mine the data.

FieldMarking combines observation in the “field” with the idea of filling out data “fields” or creating semantic “markup.”

The current prototype, FieldMarking, uses the datablogging technology at Thus we can take advantage of RSS syndication, mobile posting, and graphable data fields from shared templates. Datablogging also does not require any special plug-ins to be installed by users. Our testing suggests that, in addition to some bugginess in the software, this approach has some limitations. We need to be able to apply multiple data records to a text entry, because it often makes sense to report many observations or many kinds of observations in one paragraph. Also, we need to allow data records from other users who may dispute the original markup. Customized log types can be shared with other users of, but we’ll want to more broadly distribute across multiple platforms.

All the same, the potential is enormous and we will continue to gather pilot data on the kinds of biological information available in these unstructured data sources, the willingness of people to structure it, and the technologies that will make it possible.