Archive for the 'OWL' Category
May 15th, 2017, by Tim Finin, posted in cybersecurity, Machine Learning, NLP, OWL, Semantic Web
Ph.D. Dissertation Proposal
Modeling and Extracting information about Cybersecurity Events from Text
Tuesday, 16 May 2017, ITE 325, UMBC
People rely on the Internet to carry out much of the their daily activities such as banking, ordering food and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data and identity theft. With the large and increasing number of transaction done every day, the frequency of cybercrime events is also increasing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cybersecurity threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.
This dissertation will make two major contributions. The first is to extend our current cyber security ontologies with better models for relevant events, from atomic events like a login attempt, to an extended but related series of events that make up a campaign, to generalized events, such as an increase in denial-of-service attacks originating from a particular region of the world targeted at U.S. financial institutions. The second is the design and implementation of a event extraction system that can extract information about cybersecurity events from text and populated a knowledge graph using our cybersecurity event ontology. We will extend our previous work on event extraction that detected human activity events from news and discussion forums. A new set of features and learning algorithms will be introduced to improve the performance and adapt the system to cybersecurity domain. We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly extract cybersecurity events from text and fill in the event ontology.
Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates and Karuna Joshi
March 17th, 2017, by Tim Finin, posted in AI, KR, NLP, NLP, Ontologies, OWL, RDF, Semantic Web
The Semantics Toolkit
Paul Cuddihy and Justin McHugh
GE Global Research Center, Niskayuna, NY
10:00-11:00 Tuesday, 4 April 2017, ITE 346, UMBC
Paul Cuddihy is a senior computer scientist and software systems architect in AI and Learning Systems at the GE Global Research Center in Niskayuna, NY. He earned an M.S. in Computer Science from Rochester Institute of Technology. The focus of his twenty-year career at GE Research has ranged from machine learning for medical imaging equipment diagnostics, monitoring and diagnostic techniques for commercial aircraft engines, modeling techniques for monitoring seniors living independently in their own homes, to parallel execution of simulation and prediction tasks, and big data ontologies. He is one of the creators of the open source software “Semantics Toolkit” (SemTk) which provides a simplified interface to the semantic tech stack, opening its use to a broader set of users by providing features such as drag-and-drop query generation and data ingestion. Paul has holds over twenty U.S. patents.
Justin McHugh is computer scientist and software systems architect working in the AI and Learning Systems group at GE Global Research in Niskayuna, NY. Justin attended the State University of New York at Albany where he earned an M.S in computer science. He has worked as a systems architect and programmer for large scale reporting, before moving into the research sector. In the six years since, he has worked on complex system integration, Big Data systems and knowledge representation/querying systems. Justin is one of the architects and creators of SemTK (the Semantics Toolkit), a toolkit aimed at making the power of the semantic web stack available to programmers, automation and subject matter experts without their having to be deeply invested in the workings of the Semantic Web.
March 4th, 2017, by Tim Finin, posted in KR, Ontologies, OWL, RDF, Semantic Web
SADL – Semantic Application Design Language
Dr. Andrew W. Crapo
GE Global Research
10:00 Tuesday, 7 March 2017
The Web Ontology Language (OWL) has gained considerable acceptance over the past decade. Building on prior work in Description Logics, OWL has sufficient expressivity to be useful in many modeling applications. However, its various serializations do not seem intuitive to subject matter experts in many domains of interest to GE. Consequently, we have developed a controlled-English language and development environment that attempts to make OWL plus rules more accessible to those with knowledge to share but limited interest in studying formal representations. The result is the Semantic Application Design Language (SADL). This talk will review the foundational underpinnings of OWL and introduce the SADL constructs meant to capture, validate, and maintain semantic models over their lifecycle.
Dr. Crapo has been part of GE’s Global Research staff for over 35 years. As an Information Scientist he has built performance and diagnostic models of mechanical, chemical, and electrical systems, and has specialized in human-computer interfaces, decision support systems, machine reasoning and learning, and semantic representation and modeling. His work has included a graphical expert system language (GEN-X), a graphical environment for procedural programming (Fuselet Development Environment), and a semantic-model-driven user-interface for decision support systems (ACUITy). Most recently Andy has been active in developing the Semantic Application Design Language (SADL), enabling GE to leverage worldwide advances and emerging standards in semantic technology and bring them to bear on diverse problems from equipment maintenance optimization to information security.
April 3rd, 2016, by Tim Finin, posted in cybersecurity, Ontologies, OWL, RDF, Security, Semantic Web
Policies For Oblivious Cloud Storage
Using Semantic Web Technologies
10:30am, Monday, 4 April 2016, ITE 346, UMBC
Consumers want to ensure that their enterprise data is stored securely and obliviously on the cloud, such that the data objects or their access patterns are not revealed to anyone, including the cloud provider, in the public cloud environment. We have created a detailed ontology describing the oblivious cloud storage models and role based access controls that should be in place to manage this risk. We have also implemented the ObliviCloudManager application that allows users to manage their cloud data using oblivious data structures. This application uses role based access control model and collection based document management to store and retrieve data efficiently. Cloud consumers can use our system to define policies for storing data obliviously and manage storage on untrusted cloud platforms, even if they are not familiar with the underlying technology and concepts of the oblivious data structure.
April 27th, 2015, by Tim Finin, posted in NLP, Ontologies, OWL, RDF, Semantic Web
In this weeks ebiquity lab meeting, Ankur Padia will talk about ontology learning and the work he did for his MS thesis at 10:00am in ITE 346 at UMBC.
10:00am Tuesday, Apr. 28, 2015, ITE 346
Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been made in axiomatic Ontology Learning (called Formal Ontology Learning) from Natural Language text documents. Presentation will focus on the relationship between Description Logic and Natural Language (limited to IS-A) for Formal Ontology Learning.
April 25th, 2015, by Tim Finin, posted in AI, Ontologies, OWL, Semantic Web
Ph.D. Dissertation Defense
A Semantic Resolution Framework for Integrating
Manufacturing Service Capability Data
10:00am Monday 27 April 2015, ITE 217b
Building flexible manufacturing supply chains requires availability of interoperable and accurate manufacturing service capability (MSC) information of all supply chain participants. Today, MSC information, which is typically published either on the supplier’s web site or registered at an e-marketplace portal, has been shown to fall short of interoperability and accuracy requirements. The issue of interoperability can be addressed by annotating the MSC information using shared ontologies. However, this ontology-based approach faces three main challenges: (1) lack of an effective way to automatically extract a large volume of MSC instance data hidden in the web sites of manufacturers that need to be annotated; (2) difficulties in accurately identifying semantics of these extracted data and resolving semantic heterogeneities among individual sources of these data while integrating them under shared formal ontologies; (3) difficulties in the adoption of ontology-based approaches by the supply chain managers and users because of their unfamiliarity with the syntax and semantics of formal ontology languages such as the web ontology language (OWL).
The objective of our research is to address the main challenges of ontology-based approaches by developing an innovative approach that is able to extract MSC instances from a broad range of manufacturing web sites that may present MSC instances in various ways, accurately annotate MSC instances with formal defined semantics on a large scale, and integrate these annotated MSC instances into formal manufacturing domain ontologies to facilitate the formation of supply chains of manufacturers. To achieve this objective, we propose a semantic resolution framework (SRF) that consists of three main components: a MSC instance extractor, a MSC Instance annotator and a semantic resolution knowledge base. The instance extractor builds a local semantic model that we call instance description model (IDM) for each target manufacturer web site. The innovative aspect of the IDM is that it captures the intended structure of the target web site and associates each extracted MSC instance with a context that describes possible semantics of that instance. The instance annotator starts the semantic resolution by identifying the most appropriate class from a (or a set of) manufacturing domain ontology (or ontologies) (MDO) to annotate each instance based on the mappings established between the context of that instance and the vocabularies (i.e., classes and properties) defined in the MDO. The primary goal of the semantic resolution knowledge base (SR-KB) is to resolve semantic heterogeneity that may occur in the instance annotation process and thus improve the accuracy of the annotated MSC instances. The experimental results demonstrate that the instance extractor and the instance annotator can effectively discover and annotate MSC instances while the SR-KB is able to improve both precision and recall of annotated instances and reducing human involvement along with the evolution of the knowledge base.
Committee: Drs. Yun Peng (Chair), Tim Finin, Yaacov Yesha, Matthew Schmill and Boonserm Kulvatunyou
April 19th, 2015, by Tim Finin, posted in OWL, Privacy, RDF, Security, Semantic Web
In this week’s meeting (10-11am Tue, April 21), Ankur Padia will present work in progress on providing access control to an RDF triple store.
Triple store access control for a linked data fragments interface
Ankur Padia, UMBC
The maturation of Semantic Web standards and associated web-based data representations such as schema.org have made RDF a popular model for representing graph data and semi-structured knowledge. Triple stores are used to store and query an RDF dataset and often expose a SPARQL endpoint service on the Web for public access. Most existing SPARQL endpoints support very simple access control mechanisms if any at all, preventing their use for many applications where fine-grained privacy or data security is important. We describe new work on access control for a linked data fragments interface, i.e. one that accepts queries consisting one or more triple patterns and responds with all matching triples that the authenticated querier can access.
December 15th, 2014, by Tim Finin, posted in Mobile Computing, OWL, Policy, RDF, Semantic Web
Roberto Yus, Primal Pappachan, Prajit Das, Tim Finin, Anupam Joshi, and Eduardo Mena, Semantics for Privacy and Shared Context, Workshop on Society, Privacy and the Semantic Web-Policy and Technology, held at Int. Semantic Web Conf., Oct. 2014.
Capturing, maintaining, and using context information helps mobile applications provide better services and generates data useful in specifying information sharing policies. Obtaining the full benefit of context information requires a rich and expressive representation that is grounded in shared semantic models. We summarize some of our past work on representing and using context models and briefly describe Triveni, a system for cross-device context discovery and enrichment. Triveni represents context in RDF and OWL and reasons over context models to infer additional information and detect and resolve ambiguities and inconsistencies. A unique feature, its ability to create and manage “contextual groups” of users in an environment, enables their members to share context information using wireless ad-hoc networks. Thus, it enriches the information about a user’s context by creating mobile ad hoc knowledge networks.
September 29th, 2014, by Tim Finin, posted in OWL, RDF, Semantic Web, Web, Wikipedia
In this week’s ebiquity meeting (10am Tue. Oct 1 in ITE346), Varish Mulwad will present Infoboxer, a prototype tool he developed with Roberto Yus that overcomes these challenges using statistical and semantic knowledge from linked data sources to ease the process of creating Wikipedia infoboxes.
Wikipedia infoboxes serve as input in the creation of knowledge bases
such as DBpedia, Yago, and Freebase. Current creation of Wikipedia
infoboxes is manual and based on templates that are created and
maintained collaboratively. However, these templates pose several
- Different communities use different infobox templates for the same category articles
- Attribute names differ (e.g., date of birth vs. birthdate)
- Templates are restricted to a single category, making it harder to find a template for an article that belongs to multiple categories (e.g., actor and politician)
- Templates are free form in nature and no integrity check is performed on whether the value filled by the user is of appropriate type for the given attribute
Infoboxer creates dynamic and semantic templates by suggesting attributes common for similar articles and controlling the expected values semantically. We will give an overview of our approach and demonstrate how Infoboxer can be used to create infoboxes for new Wikipedia articles as well as update erroneous values in existing infoboxes. We will also discuss our proposed extensions to the project.
Visit http://ebiq.org/p/668 for more information about Infoboxer. A demo can be found here.
September 19th, 2014, by Tim Finin, posted in Mobile Computing, OWL, RDF, Semantic Web, Wearable Computing
Primal Pappachan, Roberto Yus, Anupam Joshi and Tim Finin, Rafiki: A Semantic and Collaborative Approach to Community Health-Care in Underserved Areas, 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, 22-15 October2014, Miami.
Community Health Workers (CHWs) act as liaisons between health-care providers and patients in underserved or un-served areas. However, the lack of information sharing and training support impedes the effectiveness of CHWs and their ability to correctly diagnose patients. In this paper, we propose and describe a system for mobile and wearable computing devices called Rafiki which assists CHWs in decision making and facilitates collaboration among them. Rafiki can infer possible diseases and treatments by representing the diseases, their symptoms, and patient context in OWL ontologies and by reasoning over this model. The use of semantic representation of data makes it easier to share knowledge related to disease, symptom, diagnosis guidelines, and patient demography, between various personnel involved in health-care (e.g., CHWs, patients, health-care providers). We describe the Rafiki system with the help of a motivating community health-care scenario and present an Android prototype for smart phones and Google Glass.
May 23rd, 2013, by Tim Finin, posted in AI, Google, KR, NLP, OWL, Semantic Web
Top Charts is a new feature for Google Trends that identifies the popular searches within a category, i.e., books or actors. What’s interesting about it, from a technology standpoint, is that it uses Google’s Knowledge Graph to provide a universe of things and the categories into which they belong. This is a great example of “Things, not strings”, Google’s clever slogan to explain the importance of the Knowledge Graph.
Here’s how it’s explained in in the Trends Top Charts FAQ.
“Top Charts relies on technology from the Knowledge Graph to identify when search queries seem to be about particular real-world people, places and things. The Knowledge Graph enables our technology to connect searches with real-world entities and their attributes. For example, if you search for ice ice baby, you’re probably searching for information about the musician Vanilla Ice or his music. Whereas if you search for vanilla ice cream recipe, you’re probably looking for information about the tasty dessert. Top Charts builds on work we’ve done so our systems do a better job finding you the information you’re actually looking for, whether tasty desserts or musicians.”
One thing to note is that the Knowledge Graph, which is said to have more than 18 billion facts about 570 million objects, is that its objects include more than the traditional named entities (e.g., people, places, things). For example, there is a top chart for Animals that shows that dogs are the most popular animal in Google searches followed by cats (no surprises here) with chickens at number three on the list (could their high rank be due to recipe searches?). The dog object, in most knowledge representation schemes, would be modeled as a concept or class as opposed to an object or instance. In some representation systems, the same term (e.g., dog) can be used to refer to both a class of instances (a class that includes Lassie) and also to an instance (e.g., an instance of the class animal types). Which sense of the term dog is meant (class vs. instance) is determined by the context. In the semantic web representation language OWL 2, the ability to use the same term to refer to a class or a related instance is called punning.
Of course, when doing this kind of mapping of terms to objects, we only want to consider concepts that commonly have words or short phrases used to denote them. Not all concepts do, such as animals that from a long way off look like flies.
A second observation is that once you have a nice knowledge base like the Knowledge Graph, you have a new problem: how can you recognize mentions of its instances in text. In the DBpedia knowledge based (derived from Wikipedia) there are nine individuals named Michael Jordan and two of them were professional basketball players in the NBA. So, when you enter a search query like “When did Michael Jordan play for Penn”, we have to use information in the query, its context and what we know about the possible referents (e.g., those nine Michael Jordans) to decide (1) if this is likely to be a reference to any of the objects in our knowledge base, and (2) if so, to which one. This task, which is a fundamental one in language processing, is not trivial, but luckily, in applications like Top Charts, we don’t have to do it with perfect accuracy.
Google’s Top Charts is a simple, but effective, example that demonstrates the potential usefulness of semantic technology to make our information systems better in the near future.
September 15th, 2011, by Tim Finin, posted in Google, KR, Ontologies, OWL, Semantic Web, Social media
The Wall Street Journal article Walked Into a Lamppost? Hurt While Crocheting? Help Is on the Way describes the International Classification of Diseases, 10th Revision that is used to describe medical problems.
“Today, hospitals and doctors use a system of about 18,000 codes to describe medical services in bills they send to insurers. Apparently, that doesn’t allow for quite enough nuance. A new federally mandated version will expand the number to around 140,000—adding codes that describe precisely what bone was broken, or which artery is receiving a stent. It will also have a code for recording that a patient’s injury occurred in a chicken coop.”
We want to see the search engine companies develop and support a Microdata vocabulary for ICD-10. An ICDM-10 OWL DL ontology has already been done, but a Microdata version might add a lot of value. We could use it on our blogs and Facebook posts to catalog those annoying problems we encounter each day, like W59.22XD (Struck by turtle, initial encounter), or Y07.53 (Teacher or instructor, perpetrator of maltreat and neglect).
Humor aside, a description logic representation (e.g., in OWL) makes the coding system seem less ridiculous. Instead of appearing as a catalog of 140K ground tags, it would emphasize that it is a collection of a much smaller number of classes that can be combined in productive ways to produce them or used to create general descriptions (e.g., bitten by an animal).