Archive for the 'KR' Category
May 2nd, 2016, by Tim Finin, posted in KR, Ontologies, Semantic Web
He’s dead, Jim.
Google recently shut down the query interface to Freebase. All that is left of this innovative service is the ability to download a few final data dumps.
Freebase was launched nine years ago by Metaweb as an online source of structured data collected from Wikipedia and many other sources, including individual, user-submitted uploads and edits. Metaweb was acquired by Google in July 2010 and Freebase subsequently grew to have more than 2.4 billion facts about 44 million subjects. In December 2014, Google announced that it was closing Freebase and four months later it became read-only. Sometime this week the query interface was shut down.
I’ve enjoyed using Freebase in various projects in the past two years and found that it complemented DBpedia in many ways. Although its native semantics differed from that of RDF and OWL, it was close enough to allow all of Freebase to be exported as RDF. Its schema was larger than DBpedia’s and the data tended to be a bit cleaner.
Google generously decided to donate the data to the Wikidata project, which began migrating Freebase’s data to Wikidata in 2015. The Freebase data also lives on as part of Google’s Knowledge Graph. Google recently allowed very limited querying of its knowledge graph and my limited experimenting with it suggests that has Freebase data at its core.
May 1st, 2016, by Tim Finin, posted in KR, NLP, Ontologies, Semantic Web
Representing and Reasoning with Temporal
Properties/Relations in OWL/RDF
10:30-11:30 Monday, 2 May 2016, ITE346
OWL ontologies offer the means for modeling real-world domains by representing their high-level concepts, properties and interrelationships. These concepts and their properties are connected by means of binary relations. However, this assumes that the model of the domain is either a set of static objects and relationships that do not change over time, or a snapshot of these objects at a particular point in time. In general, relationships between objects that change over time (dynamic properties) are not binary relations, since they involve a temporal interval in addition to the object and the subject. Representing and querying information evolving in time requires careful consideration of how to use OWL constructs to model dynamic relationships and how the semantics and reasoning capabilities within that architecture are affected.
December 16th, 2015, by Tim Finin, posted in cybersecurity, KR, Ontologies, Semantic Web
Zareen Syed, Ankur Padia, Tim Finin, Lisa Mathews and Anupam Joshi, UCO: Unified Cybersecurity Ontology
, AAAI Workshop on Artificial Intelligence for Cyber Security (AICS), February 2016.
In this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integrates heterogeneous data and knowledge schemas from different cybersecurity systems and most commonly used cybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to a number of existing cybersecurity ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO to serve as the core for cybersecurity domain, which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use cases supported by the UCO ontology. To the best of our knowledge, this is the first cybersecurity ontology that has been mapped to general world ontologies to support broader and diverse security use cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.
June 8th, 2015, by Tim Finin, posted in AI, KR, NLP, NLP, Ontologies
Coldstart is a task in the NIST Text Analysis Conference’s Knowledge Base Population suite that combines entity linking and slot filling to populate an empty knowledge base using a predefined ontology for the facts and relations. This paper describes a system developed by the Human Language Technology Center of Excellence at Johns Hopkins University for the 2014 Coldstart task.
Tim Finin, Paul McNamee, Dawn Lawrie, James Mayfield and Craig Harman, Hot Stuff at Cold Start: HLTCOE participation at TAC 2014, 7th Text Analysis Conference, National Institute of Standards and Technology, Nov. 2014.
The JHU HLTCOE participated in the Cold Start task in this year’s Text Analysis Conference Knowledge Base Population evaluation. This is our third year of participation in the task, and we continued our research with the KELVIN system. We submitted experimental variants that explore use of forward-chaining inference, slightly more aggressive entity clustering, refined multiple within-document conference, and prioritization of relations extracted from news sources.
June 8th, 2015, by Tim Finin, posted in AI, KR, Machine Learning, Mobile Computing, Ontologies
The NSF-sponsored Platys project explored the idea that places are more than just GPS coordinates. They are concepts rich with semantic information, including people, activities, roles, functions, time and purpose. Our mobile phones can learn to recognize the places we are in and use information about them to provide better services.
Laura Zavala, Pradeep K. Murukannaiah, Nithyananthan Poosamani, Tim Finin, Anupam Joshi, Injong Rhee and Munindar P. Singh, Platys: From Position to Place-Oriented Mobile Computing, AI Magazine, v36, n2, 2015.
The Platys project focuses on developing a high-level, semantic notion of location called place. A place, unlike a geospatial position, derives its meaning from a user’s actions and interactions in addition to the physical location where it occurs. Our aim is to enable the construction of a large variety of applications that take advantage of place to render relevant content and functionality and, thus, improve user experience. We consider elements of context that are particularly related to mobile computing. The main problems we have addressed to realize our place-oriented mobile computing vision are representing places, recognizing places, and engineering place-aware applications. We describe the approaches we have developed for addressing these problems and related subproblems. A key element of our work is the use of collaborative information sharing where users’ devices share and integrate knowledge about places. Our place ontology facilitates such collaboration. Declarative privacy policies allow users to specify contextual features under which they prefer to share or not share their information.
June 5th, 2015, by Tim Finin, posted in Big data, KR, Machine Learning, Semantic Web
New paper: Zareen Syed, Tim Finin, Muhammad Rahman, James Kukla and Jeehye Yun, Discovering and Querying Hybrid Linked Data, Third Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, held in conjunction with the 12th Extended Semantic Web Conference, Portoroz Slovenia, June 2015.
In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and unstructured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.
December 29th, 2014, by Tim Finin, posted in KR, Machine Learning, NLP, Ontologies, Semantic Web
TABEL — A Domain Independent and Extensible Framework
for Inferring the Semantics of Tables
8:00am Thursday, 8 January 2015, ITE325b
Tables are an integral part of documents, reports and Web pages in many scientific and technical domains, compactly encoding important information that can be difficult to express in text. Table-like structures outside documents, such as spreadsheets, CSV files, log files and databases, are widely used to represent and share information. However, tables remain beyond the scope of regular text processing systems which often treat them like free text.
This dissertation presents TABEL — a domain independent and extensible framework to infer the semantics of tables and represent them as RDF Linked Data. TABEL captures the intended meaning of a table by mapping header cells to classes, data cell values to existing entities and pair of columns to relations from an given ontology and knowledge base. The core of the framework consists of a module that represents a table as a graphical model to jointly infer the semantics of headers, data cells and relation between headers. We also introduce a novel Semantic Message Passing scheme, which incorporates semantics into message passing, to perform joint inference over the probabilistic graphical model. We also develop and explore a “human-in-the-loop” paradigm, presenting plausible models of user interaction with our framework and its impact on the quality of inferred semantics.
We present techniques that are both extensible and domain agnostic. Our framework supports the addition of preprocessing modules without affecting existing ones, making TABEL extensible. It also allows background knowledge bases to be adapted and changed based on the domains of the tables, thus making it domain independent. We demonstrate the extensibility and domain independence of our techniques by developing an application of TABEL in the healthcare domain. We develop a proof of concept for an application to generate meta-analysis reports automatically, which is built on top of the semantics inferred from tables found in medical literature.
A thorough evaluation with experiments over dataset of tables from the Web and medical research reports presents promising results.
Committee: Drs. Tim Finin (chair), Tim Oates, Anupam Joshi, Yun Peng, Indrajit Bhattacharya (IBM Research) and L. V. Subramaniam (IBM Research)
July 15th, 2014, by Tim Finin, posted in Big data, KR, Ontologies, RDF, Semantic Web
In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.
:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.
:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.
It looks like it’s worth checking out!
January 30th, 2014, by Tim Finin, posted in KR, Ontologies, Semantic Web
Today’s online meeting (Jan 30, 12:30-2:30 EST) in the 2014 Ontology Summit series is part of the Tools, Services, and Techniques track and features presentations by
- Dr. ChrisWelty (IBM Research) on “Inside the Mind of Watson – a Natural Language Question Answering Service Powered by the Web of Data and Ontologies”
- Prof. AlanRector (U. Manchester) on “Axioms & Templates: Distinctions and Transformations amongst Ontologies, Frames, & Information Models
- Professor TillMossakowski (U. Magdeburg) on “Challenges in Scaling Tools for Ontologies to the Semantic Web: Experiences with Hets and OntoHub”
Audio via phone (206-402-0100) or Skype. See the session page for details and access to slides.
January 23rd, 2014, by Tim Finin, posted in KR, Ontologies, Semantic Web
The first online session of the 2014 Ontology Summit on “Big Data and Semantic Web Meet Applied Ontology” takes place today (Thurday January 23) from 12:30pm to 2:30pm (EST, UTC-5) with topic Common Reusable Semantic Content — The Problems and Efforts to Address Them. The session will include four presentations:
followed by discussion.
Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.
January 14th, 2014, by Tim Finin, posted in Big data, KR, Ontologies, Semantic Web
The ninth Ontology Summit starts on Thursday, January 16 with the theme “Big Data and Semantic Web Meet Applied Ontology.” The event kicks off a three month series of weekly online meetings on Thursdays that feature presentations from expert panels and discussions with all of the participants. The series will culminate with a two day symposium on April 28-29 in Arlington VA. The sessions are free and open to all, including researchers, practitioners and students.
The first virtual meeting will be held 12:30-
2:00 2:30 (EST) on Thursday, January 16 and will introduce the nine different topical tracks in the series, their goals and organizers. Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.
This year’s Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies. On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities.
The 2014 Ontology Summit is chaired by Michael Gruninger and Leo Obrst.
May 23rd, 2013, by Tim Finin, posted in AI, Google, KR, NLP, OWL, Semantic Web
Top Charts is a new feature for Google Trends that identifies the popular searches within a category, i.e., books or actors. What’s interesting about it, from a technology standpoint, is that it uses Google’s Knowledge Graph to provide a universe of things and the categories into which they belong. This is a great example of “Things, not strings”, Google’s clever slogan to explain the importance of the Knowledge Graph.
Here’s how it’s explained in in the Trends Top Charts FAQ.
“Top Charts relies on technology from the Knowledge Graph to identify when search queries seem to be about particular real-world people, places and things. The Knowledge Graph enables our technology to connect searches with real-world entities and their attributes. For example, if you search for ice ice baby, you’re probably searching for information about the musician Vanilla Ice or his music. Whereas if you search for vanilla ice cream recipe, you’re probably looking for information about the tasty dessert. Top Charts builds on work we’ve done so our systems do a better job finding you the information you’re actually looking for, whether tasty desserts or musicians.”
One thing to note is that the Knowledge Graph, which is said to have more than 18 billion facts about 570 million objects, is that its objects include more than the traditional named entities (e.g., people, places, things). For example, there is a top chart for Animals that shows that dogs are the most popular animal in Google searches followed by cats (no surprises here) with chickens at number three on the list (could their high rank be due to recipe searches?). The dog object, in most knowledge representation schemes, would be modeled as a concept or class as opposed to an object or instance. In some representation systems, the same term (e.g., dog) can be used to refer to both a class of instances (a class that includes Lassie) and also to an instance (e.g., an instance of the class animal types). Which sense of the term dog is meant (class vs. instance) is determined by the context. In the semantic web representation language OWL 2, the ability to use the same term to refer to a class or a related instance is called punning.
Of course, when doing this kind of mapping of terms to objects, we only want to consider concepts that commonly have words or short phrases used to denote them. Not all concepts do, such as animals that from a long way off look like flies.
A second observation is that once you have a nice knowledge base like the Knowledge Graph, you have a new problem: how can you recognize mentions of its instances in text. In the DBpedia knowledge based (derived from Wikipedia) there are nine individuals named Michael Jordan and two of them were professional basketball players in the NBA. So, when you enter a search query like “When did Michael Jordan play for Penn”, we have to use information in the query, its context and what we know about the possible referents (e.g., those nine Michael Jordans) to decide (1) if this is likely to be a reference to any of the objects in our knowledge base, and (2) if so, to which one. This task, which is a fundamental one in language processing, is not trivial, but luckily, in applications like Top Charts, we don’t have to do it with perfect accuracy.
Google’s Top Charts is a simple, but effective, example that demonstrates the potential usefulness of semantic technology to make our information systems better in the near future.