UMBC ebiquity
Semantic Web

Archive for the 'Semantic Web' Category

Modeling and Extracting information about Cybersecurity Events from Text

May 15th, 2017, by Tim Finin, posted in cybersecurity, Machine Learning, NLP, OWL, Semantic Web

Ph.D. Dissertation Proposal

Modeling and Extracting information about Cybersecurity Events from Text

Taneeya Satyapanich

Tuesday, 16 May 2017, ITE 325, UMBC

People rely on the Internet to carry out much of the their daily activities such as banking, ordering food and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data and identity theft. With the large and increasing number of transaction done every day, the frequency of cybercrime events is also increasing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cybersecurity threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation will make two major contributions. The first is to extend our current cyber security ontologies with better models for relevant events, from atomic events like a login attempt, to an extended but related series of events that make up a campaign, to generalized events, such as an increase in denial-of-service attacks originating from a particular region of the world targeted at U.S. financial institutions. The second is the design and implementation of a event extraction system that can extract information about cybersecurity events from text and populated a knowledge graph using our cybersecurity event ontology. We will extend our previous work on event extraction that detected human activity events from news and discussion forums. A new set of features and learning algorithms will be introduced to improve the performance and adapt the system to cybersecurity domain. We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly extract cybersecurity events from text and fill in the event ontology.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates and Karuna Joshi

new paper: Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps

May 15th, 2017, by Tim Finin, posted in AI, Machine Learning, NLP, Paper, Semantic Web

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane, Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps, AAAI Spring Symposium on AI for Social Good, AAAI Press, March, 2017.

Climate change is an important social issue and the subject of much research, both to understand the history of the Earth’s changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.

Fact checking the fact checkers fact check metadata

May 13th, 2017, by Tim Finin, posted in schema.org, Semantic Web

TL;DR: Some popular fact checking sites are saying that false is true and true is false in their embedded metadata 

I’m a fan of the schema.org claimReview tags for rendering fact checking results as metadata markup embedded in the html that can be easily understood by machines. Google gave a plug for this last Fall and more recently announced that it has broadened its use of the fact checking metadata tags.  It’s a great idea and could help limit the spread of false information on the Web.  But its adoption still has some problems.

Last week I checked to see if the Washington Post is using schema.org’s ClaimReview in their Fact Checker pieces. They are (that’s great!) but WaPo seems to have misunderstood the semantics of the markup by reversing the reviewRating scale, with the result that it assets the opposite of its findings.  For an example, look at this Fact Checker article reviewing claims made by HHS Secretary Tom Price on the AHCA which WaPo rates as being very false, but gives it a high reviewRating of 5 on their scale from 1 to 6.  According to the schema.org specification, this means it’s mostly true, rather than false. ??

WaPo’s Fact Check article ratings assign a checkmark for a claim they find true and from one to four ‘pinocchios‘ for claims they find to be partially (one) or totally (four) false. They also give no rating for claims they find unclear and a ‘flip-flop‘ rating for claims on which a person has been inconsistent. Their reviewRating metadata specifies a worstRating of 1 and a bestRating of 6. They apparently map a checkmark to 1 and ‘four pinocchios‘ to 5. That is, their mapping is {-1:’unclear’; 1:’check mark’, 2:’1 pinocchio’, …, 5:’4 pinocchios’, 6:’flip flop’}. It’s clear from the schema.org ClaimReview examples that that a higher rating number is better and it’s implicit that it is better for a claim to be true.  So I assume that the WaPo FactCheck should reverse its scale, with ‘flip-flop‘ getting a 1, ‘four pinocchios‘ mapped to a 2 and a checkmark assigned a 6.

WaPo is not the only fact checking site that has got this reversed. Aaron Bradley pointed out early in April that Politifact had it’s scale reversed also. I checked last week and confirmed that this was still the case, as this example shows. I sampled a number of Snope’s ClaimCheck ratings and found that all of them were -1 on a scale of -1..+1, as in this example.

It’s clear how this mistake can happen.  Many fact checking sites are motivated by identifying false facts, so have native scales that go from the mundane true statement to the brazen and outrageous completely false.  So a mistake of directly mapping this linear scale into the numeric one from low to high is not completely surprising.

While the fact checking sites that have made this mistake are run by dedicated and careful investigators, the same care has not yet been applied in implementing the semantic metadata embedded in their pages on for their sites.

New paper: A Question and Answering System for Management of Cloud Service Level Agreements

May 13th, 2017, by Tim Finin, posted in AI, KR, NLP, Paper, Semantic Web

Sudip Mittal, Aditi Gupta, Karuna Pande Joshi, Claudia Pearce and Anupam Joshi, A Question and Answering System for Management of Cloud Service Level Agreements,  IEEE International Conference on Cloud Computing, June 2017.

One of the key challenges faced by consumers is to efficiently manage and monitor the quality of cloud services. To manage service performance, consumers have to validate rules embedded in cloud legal contracts, such as Service Level Agreements (SLA) and Privacy Policies, that are available as text documents. Currently this analysis requires significant time and manual labor and is thus inefficient. We propose a cognitive assistant that can be used to manage cloud legal documents by automatically extracting knowledge (terms, rules, constraints) from them and reasoning over it to validate service performance. In this paper, we present this Question and Answering (Q&A) system that can be used to analyze and obtain information from the SLA documents. We have created a knowledgebase of Cloud SLAs from various providers which forms the underlying repository of our Q&A system. We utilized techniques from natural language processing and semantic web (RDF, SPARQL and Fuseki server) to build our framework. We also present sample queries on how a consumer can compute metrics such as service credit.

Google search now includes schema.org fact check data

April 8th, 2017, by Tim Finin, posted in RDF, schema.org, Semantic Web, Web

Google claims on their search blog that “Fact Check now available in Google Search and News”.  We’ve sampled searches on Google and found that some results did indeed include Fact Check data from schema.org’s ClaimReview markup.  So we are including the following markup on this page.

    
    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "ClaimReview",
      "datePublished": "2016-04-08",
      "url": "http://ebiquity.umbc.edu/blogger/2017/04/08/google-search-now-
              including-schema-org-fact-check-data",
      "itemReviewed":
      {
        "@type": "CreativeWork",
        "author":
        {
          "@type": "Organization",
          "name": "Google"
        },
        "datePublished": "2016-04-07"
      },
      "claimReviewed": "Fact Check now available in Google search and news",
      "author":
      {
        "@type": "Organization",
        "Name": "UMBC Ebiquity Research Group",
        "url": "http://ebiquity.umbc.edu/"
      },
      "reviewRating":
      {
        "@type": "Rating",
        "ratingValue": "5",
        "bestRating": "5",
        "worstRating": "1",
        "alternateName" : "True"
      }
    }</script>

Google notes that

“Only publishers that are algorithmically determined to be an authoritative source of information will qualify for inclusion. Finally, the content must adhere to the general policies that apply to all structured data markup, the Google News Publisher criteria for fact checks, and the standards for accountability and transparency, readability or proper site representation as articulated in our Google News General Guidelines. If a publisher or fact check claim does not meet these standards or honor these policies, we may, at our discretion, ignore that site’s markup.”

and we hope that the algorithms will find us to be an authoritative source of information.

You can see the actual markup by viewing this page’s source or looking at the markup that Google’s structured data testing tool finds on it here by clicking on ClaimReview in the column on the right.

Update: We’ve been algorithmically determined to be an authoritative source of information!

SemTk: The Semantics Toolkit from GE Global Research, 4/4

March 17th, 2017, by Tim Finin, posted in AI, KR, NLP, NLP, Ontologies, OWL, RDF, Semantic Web

The Semantics Toolkit

Paul Cuddihy and Justin McHugh
GE Global Research Center, Niskayuna, NY

10:00-11:00 Tuesday, 4 April 2017, ITE 346, UMBC

SemTk (Semantics Toolkit) is an open source technology stack built by GE Scientists on top of W3C Semantic Web standards.  It was originally conceived for data exploration and simplified query generation, and later expanded to a more general semantics abstraction platform. SemTk is made up of a Java API and microservices along with Javascript front ends that cover drag-and-drop query generation, path finding, data ingestion and the beginnings of stored procedure support.   In this talk we will give a tour of SemTk, discussing its architecture and direction, and demonstrate it’s features using the SPARQLGraph front-end hosted at http://semtk.research.ge.com.

Paul Cuddihy is a senior computer scientist and software systems architect in AI and Learning Systems at the GE Global Research Center in Niskayuna, NY. He earned an M.S. in Computer Science from Rochester Institute of Technology. The focus of his twenty-year career at GE Research has ranged from machine learning for medical imaging equipment diagnostics, monitoring and diagnostic techniques for commercial aircraft engines, modeling techniques for monitoring seniors living independently in their own homes, to parallel execution of simulation and prediction tasks, and big data ontologies.  He is one of the creators of the open source software “Semantics Toolkit” (SemTk) which provides a simplified interface to the semantic tech stack, opening its use to a broader set of users by providing features such as drag-and-drop query generation and data ingestion.  Paul has holds over twenty U.S. patents.

Justin McHugh is computer scientist and software systems architect working in the AI and Learning Systems group at GE Global Research in Niskayuna, NY. Justin attended the State University of New York at Albany where he earned an M.S in computer science. He has worked as a systems architect and programmer for large scale reporting, before moving into the research sector. In the six years since, he has worked on complex system integration, Big Data systems and knowledge representation/querying systems. Justin is one of the architects and creators of SemTK (the Semantics Toolkit), a toolkit aimed at making the power of the semantic web stack available to programmers, automation and subject matter experts without their having to be deeply invested in the workings of the Semantic Web.

SADL: Semantic Application Design Language

March 4th, 2017, by Tim Finin, posted in KR, Ontologies, OWL, RDF, Semantic Web

SADL – Semantic Application Design Language

Dr. Andrew W. Crapo
GE Global Research

 10:00 Tuesday, 7 March 2017

The Web Ontology Language (OWL) has gained considerable acceptance over the past decade. Building on prior work in Description Logics, OWL has sufficient expressivity to be useful in many modeling applications. However, its various serializations do not seem intuitive to subject matter experts in many domains of interest to GE. Consequently, we have developed a controlled-English language and development environment that attempts to make OWL plus rules more accessible to those with knowledge to share but limited interest in studying formal representations. The result is the Semantic Application Design Language (SADL). This talk will review the foundational underpinnings of OWL and introduce the SADL constructs meant to capture, validate, and maintain semantic models over their lifecycle.

 

Dr. Crapo has been part of GE’s Global Research staff for over 35 years. As an Information Scientist he has built performance and diagnostic models of mechanical, chemical, and electrical systems, and has specialized in human-computer interfaces, decision support systems, machine reasoning and learning, and semantic representation and modeling. His work has included a graphical expert system language (GEN-X), a graphical environment for procedural programming (Fuselet Development Environment), and a semantic-model-driven user-interface for decision support systems (ACUITy). Most recently Andy has been active in developing the Semantic Application Design Language (SADL), enabling GE to leverage worldwide advances and emerging standards in semantic technology and bring them to bear on diverse problems from equipment maintenance optimization to information security.

PhD Proposal: Ankur Padia, Dealing with Dubious Facts in Knowledge Graphs

November 29th, 2016, by Tim Finin, posted in KR, Machine Learning, NLP, NLP, Semantic Web

the skeptic

Dissertation Proposal

Dealing with Dubious Facts
in Knowledge Graphs

Ankur Padia

1:00-3:00pm Wednesday, 30 November 2016, ITE 325b, UMBC

Knowledge graphs are structured representations of facts where nodes are real-world entities or events and edges are the associations among the pair of entities. Knowledge graphs can be constructed using automatic or manual techniques. Manual techniques construct high quality knowledge graphs but are expensive, time consuming and not scalable. Hence, automatic information extraction techniques are used to create scalable knowledge graphs but the extracted information can be of poor quality due to the presence of dubious facts.

An extracted fact is dubious if it is incorrect, inexact or correct but lacks evidence. A fact might be dubious because of the errors made by NLP extraction techniques, improper design consideration of the internal components of the system, choice of learning techniques (semi-supervised or unsupervised), relatively poor quality of heuristics or the syntactic complexity of underlying text. A preliminary analysis of several knowledge extraction systems (CMU’s NELL and JHU’s KELVIN) and observations from the literature suggest that dubious facts can be identified, diagnosed and managed. In this dissertation, I will explore approaches to identify and repair such dubious facts from a knowledge graph using several complementary approaches, including linguistic analysis, common sense reasoning, and entity linking.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Paul McNamee (JHU), Partha Talukdar (IISc, India)

Managing Cloud Storage Obliviously

May 24th, 2016, by Tim Finin, posted in cloud computing, cybersecurity, Privacy, Security, Semantic Web

Vaishali Narkhede, Karuna Pande Joshi, Tim Finin, Seung Geol Choi, Adam Aviv and Daniel S. Roche, Managing Cloud Storage Obliviously, International Conference on Cloud Computing, IEEE Computer Society, June 2016.

Consumers want to ensure that their enterprise data is stored securely and obliviously on the cloud, such that the data objects or their access patterns are not revealed to anyone, including the cloud provider, in the public cloud environment. We have created a detailed ontology describing the oblivious cloud storage models and role based access controls that should be in place to manage this risk. We have developed an algorithm to store cloud data using oblivious data structure defined in this paper. We have also implemented the ObliviCloudManager application that allows users to manage their cloud data by validating it before storing it in an oblivious data structure. Our application uses role-based access control model and collection based document management to store and retrieve data efficiently. Cloud consumers can use our system to define policies for storing data obliviously and manage storage on untrusted cloud platforms even if they are unfamiliar with the underlying technology and concepts of oblivious data structures.

Streamlining Management of Multiple Cloud Services

May 22nd, 2016, by Tim Finin, posted in cloud computing, KR, Ontologies, Semantic Web

cloudhandshake

Aditi Gupta, Sudip Mittal, Karuna Pande Joshi, Claudia Pearce and Anupam Joshi, Streamlining Management of Multiple Cloud Services, IEEE International Conference on Cloud Computing, June 2016.

With the increase in the number of cloud services and service providers, manual analysis of Service Level Agreements (SLA), comparison between different service offerings and conformance regulation has become a difficult task for customers. Cloud SLAs are policy documents describing the legal agreement between cloud providers and customers. SLA specifies the commitment of availability, performance of services, penalties associated with violations and procedure for customers to receive compensations in case of service disruptions. The aim of our research is to develop technology solutions for automated cloud service management using Semantic Web and Text Mining techniques. In this paper we discuss in detail the challenges in automating cloud services management and present our preliminary work in extraction of knowledge from SLAs of different cloud services. We extracted two types of information from the SLA documents which can be useful for end users. First, the relationship between the service commitment and financial credit. We represented this information by enhancing the existing Cloud service ontology proposed by us in our previous research. Second, we extracted rules in the form of obligations and permissions from SLAs using modal and deontic logic formalizations. For our analysis, we considered six publicly available SLA documents from different cloud computing service providers.

chmod 000 Freebase

May 2nd, 2016, by Tim Finin, posted in KR, Ontologies, Semantic Web

rip freebase

He’s dead, Jim.

Google recently shut down the query interface to Freebase. All that is left of this innovative service is the ability to download a few final data dumps.

Freebase was launched nine years ago by Metaweb as an online source of structured data collected from Wikipedia and many other sources, including individual, user-submitted uploads and edits. Metaweb was acquired by Google in July  2010 and Freebase subsequently grew to have more than 2.4 billion facts about 44 million subjects. In December 2014, Google announced that it was closing Freebase and four months later it became read-only. Sometime this week the query interface was shut down.

I’ve enjoyed using Freebase in various projects in the past two years and found that it complemented DBpedia in many ways. Although its native semantics differed from that of RDF and OWL, it was close enough to allow all of Freebase to be exported as RDF.  Its schema was larger than DBpedia’s and the data tended to be a bit cleaner.

Google generously  decided to donate the data to the Wikidata project, which began migrating Freebase’s data to Wikidata in 2015.  The Freebase data also lives on as part of Google’s Knowledge Graph.  Google recently allowed very limited querying of its knowledge graph and my limited experimenting with it suggests that has Freebase data at its core.

Representing and Reasoning with Temporal Properties/Relations in OWL/RDF

May 1st, 2016, by Tim Finin, posted in KR, NLP, Ontologies, Semantic Web

Representing and Reasoning with Temporal
Properties/Relations in OWL/RDF

Clare Grasso

10:30-11:30 Monday, 2 May 2016, ITE346

OWL ontologies offer the means for modeling real-world domains by representing their high-level concepts, properties and interrelationships. These concepts and their properties are connected by means of binary relations. However, this assumes that the model of the domain is either a set of static objects and relationships that do not change over time, or a snapshot of these objects at a particular point in time. In general, relationships between objects that change over time (dynamic properties) are not binary relations, since they involve a temporal interval in addition to the object and the subject. Representing and querying information evolving in time requires careful consideration of how to use OWL constructs to model dynamic relationships and how the semantics and reasoning capabilities within that architecture are affected.

You are currently browsing the archives for the Semantic Web category.

  Home | Archive | Login | Feed