Preprint: Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports

July 17th, 2014


Varish Mulwad, Tim Finin and Anupam Joshi, Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports, 15th IEEE Int. Conf. on Information Reuse and Integration, Aug 2014.

Evidence-based medicine is the application of current medical evidence to patient care and typically uses quantitative data from research studies. It is increasingly driven by data on the efficacy of drug dosages and the correlations between various medical factors that are assembled and integrated through meta–analyses (i.e., systematic reviews) of data in tables from publications and clinical trial studies. We describe a important component of a system to automatically produce evidence reports that performs two key functions: (i) understanding the meaning of data in medical tables and (ii) identifying and retrieving relevant tables given a input query. We present modifications to our existing framework for inferring the semantics of tables and an ontology developed to model and represent medical tables in RDF. Representing medical tables as RDF makes it easier for the automatic extraction, integration and reuse of data from multiple studies, which is essential for generating meta–analyses reports. We show how relevant tables can be identified by querying over their RDF representations and describe two evaluation experiments: one on mapping medical tables to linked data and another on identifying tables relevant to a retrieval query.

Entity Disambiguation in Google Auto-complete

September 23rd, 2012

Google has added an “entity disambiguation” feature along with auto-complete when you type in your search query. For example, when I search for George Bush, I get the following additional information in auto-complete.

As you can see, Google is able to identify that there are two George Bushes’ — the 41st and the 43rd President and accordingly makes a suggestion to the user to select the appropriate president. Similarly, if you search for Johns Hopkins, you get suggestions for John Hopkins – the University, the Entrepreneur and the Hospital.  In the case of the Hopkins query, its the same entity name but with different types and thus Google appends different entity types along with the entity name.

However, searching for Michael Jordan produces no entity disambiguation. If you are looking for Michael Jordan, the UC Berkeley professor, you will have to search for “Michael I Jordan“. Other examples that Google is not handling right now include queries such as apple — {fruit, company}, jaguar {animal, car}.  It seems to me that Google is only including disambiguation between popular entities in its auto-complete. While there are six different George Bushes’ and ten different Michael Jordans‘ on Wikipedia, Google includes only two and none respectively when it disambiguates George Bush and Michael Jordan.

Google talked about using its knowledge graph to produce this information.  One can envision the knowledge graph maintaining, a unique identity for each entity in its collection, which will allow it to disambiguate entities with similar names (in the Semantic Web world, we call it as assigning a unique uri to each unique thing or entity). With the Hopkins query, we can also see that the knowledge graph is maintaining entity type information along with each entity (e.g. Person, City, University, Sports Team etc).  While folks at Google have tried to steer clear of the Semantic Web, one can draw parallels between the underlying principles on the Semantic Web and the ones used in constructing the Google knowledge graph.

AAAI Symposium on Open Government Knowledge, 4-6 Nov 2010, Arlington VA

November 2nd, 2011

If you are in the DC area this weekend and are interested in using Semantic Web technologies, you should come to the AAAI 2011 Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges. It runs from Friday to Sunday midday at the he Westin Arlington Gateway in Arlington, Virginia.

Join us to meet the thought governmental and business leaders in US open government data activities, and discuss the challenges. The symposium features Friday (Nov 4) as governmental day with speakers on,, open gov data activities in NIH/NCI and NASA and Saturday (Nov 5) as R&D day with speakers from industry, including Google and Microsoft, as well international researchers.

This symposium will explore how AI technologies such as the Semantic Web, information extraction, statistical analysis and machine learning, can be used to make the valuable knowledge embedded in open government data more explicit, accessible and reusable.

See the OGK website for complete details.

Open Government Knowledge: AI Opportunities and Challenges (OGK2011)

March 29th, 2011

The 2011 AAAI Fall Symposium on Open Government Knowledge: AI Opportunities and Challenges (OGK2011) seeks papers on all aspects of publishing public government data as reusable knowledge on the Web. Both long papers presenting research results and shorter papers describing late breaking work, outlining implemented systems, identifying new research challenges, or articulating a position are invited. Submissions are due by June 3, notifications will be sent by July 15, and the final camera-ready copy must be provided by September 9.

Websites like, and aim to improve government transparency, increase accountability, and encourage public participation by publishing public government data online. Although this data has been used for some intriguing applications, it is difficult for citizens to understand and use. This symposium will explore how AI technologies such as the Semantic Web, information extraction, statistical analysis and machine learning can be used to make the knowledge embedded in the data more explicit, accessible and reusable. The symposium’s location of Washington, DC will facilitate the participation of U.S. federal government agency members and enable interchange between researchers and practitioners. We also expect attendance of international open government data players from e.g. UK and Australia.

Relevant topics include the automatic and semi-automatic creation of linked data resources, ontologies for government data, entity linking and co-reference detection between linked data resources, adding temporal qualifications to government data, creating mash-ups with open government data, linked open government data analysis, metadata for provenance, certainty and trust, policies for information sharing, privacy and use, social networks and government data, machine learning applied to government data, data visualization techniques, and applications.

This symposium will include a mix of invited talks, paper presentations, panels, system demonstrations, a poster session, and discussions. We plan to have several invited speakers drawn from government, academia and industry. We will run panels on the emerging challenges and best practices, including (i) how to enhance transparency and interoperability within an agency and across different agencies/countries, and (ii) how to promote nationwide health information network that effectively integrates government-curated public records and citizens’ personal health data.

The symposium organizers are Li Ding (RPI), Tim Finin (UMBC), Lalana Kagal (MIT) and Deborah McGuinness (RPI). Program committee members and additional information are listed on the OGK2011 symposium site. For more information about the the symposium, send email inquiries to

Important Dates

  • Workshop: 4-6 November 2011 in Arlington, Virginia USA
  • Submissions due: 3 June 2011
  • Decisions by: 15 July 15 2011
  • Camera ready by: 9 September 2011

From tables to 5 star linked data

December 25th, 2010

The goal and vision of the Semantic Web is to create a Web of connected and interlinked data (items) which can be shared and reused by all. Sharing and opening up “raw data” is great; but the Semantic Web isn’t just about sharing data. To create a Web of data, one needs interlinking between data. In 2006, Sir Tim Berners-Lee introduced the notion of linked data in which he outlined the best practices for creating and sharing data on the Web. To encourage people and government to share data, he recently developed the following rating system –

The highest rating is for the data that can link to other people’s data to provide context. While the Semantic Web has been growing steadily, there is lot of data that is still in raw format. A study by Google researchers shows that there are 154 million tables with high quality relational data on the world wide web. The US government along with 7 other nations have started sharing data publicly. Not all the data is RDF or confers with the best practices of publishing and sharing linked data.

Here in the Ebiquity Research Lab, we have been focusing on converting data in tables and spreadsheets into RDF; but our focus is not on generating just RDF, but rather generate high quality linked data (as now Berners-Lee calls it “5 star data”). Our goal is to build a completely automated framework for interpreting tables and generating linked data from it.

As part of our preliminary research, we have already developed a baseline framework which can link the table column headers to classes from ontologies in the linked data cloud datasets, link the table cells to entities in the linked data cloud and identify relations between table columns and map them to properties in the linked data cloud. You can read papers related to our preliminary research at [1]. We will use this blog as a medium to publish updates in our pursuit of creating “5-star” data for the Semantic Web.

If you are data publisher, go grab some Linked Data star badges at [2]. You can show your support to the open data movement by gettings t-shirts, mugs and bumper stickers from [3]  ! (all profits go to W3C)

Happy Holidays ! Let 2011 be yet another step forward in the open data movement !

[1] –

[2] –

[3] –

RPI exports information as linked data

November 6th, 2009

UMBC alumnus Joab Jackson has an article in Government Computer News, Tim Berners-Lee: Machine-readable Web still a ways off, reporting on the International Semantic Web Conference help outside of Washington DC at the end of October. The article uses to illustrate the challenges and opportunities for the Semantic Web. is a site whose purpose “is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.”

Jackson quotes Tim Berners-Lee

“When you look at putting government data on the Web, one of the concerns is … to not just put it out there on Excel files on,” he said. “You should put these things in” the Resource Description Framework.

and later describes a project at RPI to republish information from in RDF leaded by another UMBC alumnus, Li Ding.

“Our goal is to make the whole thing shareable and replicable for others to re-use,” said project researcher Li Ding. By rendering data into RDF, it can be more easily interposed with other sets of data to create entirely new datasets and visualizations, Ding said. He showed a Google Map-based graphic that interposed RDF-versions of two different data sources from the Environmental Protection Agency, originally rendered in CSV files. information as linked data

Video from Tim Berners-Lee 2009 TED talk on linked data

March 14th, 2009

Here is the video of the talk that Tim Berners-Lee gave at the TED2009 conference on linked data.

You can see the slides that TBL used on the W3C site.

I may have missed it, but I don’t think he mentioned the phrase “Semantic Web” once during the 16 minute talk.

Reuters Calais to support Semantic Web Linked Data in next release

November 14th, 2008

Thompson Reuters announced on their blog (Life in the Linked Data Cloud: Calais Release 4) that their next release of the Calais web-based information extraction services will support linked data.

“In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.”

The new capabilities will be available in release 4 that is expected
out on 09 January 2009.

The change is based on Calais returning de-referenceable URIs for the entities it finds. Accessing those URIs will produce RDF with links to corresponding entities in DBpedia, Freebase and other sources of “Semantic Web” data. It will be very interesting to see how well their system does at mapping document entities (e.g., “secretary Rice”) to entities in the LOD cloud such as Accessing that URI with a request for content type application/rdf+xml returns the RDF at that has RDF assertions extracted by DBpedia from Wikipedia.