UMBC ebiquity
UMBC eBiquity Blog

New paper: Question and Answering System for Management of Cloud Service Level Agreements

Tim Finin, 10:18am 21 May 2017

Sudip Mittal, Aditi Gupta, Karuna Pande Joshi, Claudia Pearce and Anupam Joshi, A Question and Answering System for Management of Cloud Service Level Agreements, Proceedings of the IEEE International Conference on Cloud Computing, June 2017.

One of the key challenges faced by consumers is to efficiently manage and monitor the quality of cloud services. To manage service performance, consumers have to validate rules embedded in cloud legal contracts, such as Service Level Agreements (SLA) and Privacy Policies, that are available as text documents. Currently this analysis requires significant time and manual labor and is thus inefficient. We propose a cognitive assistant that can be used to manage cloud legal documents by automatically extracting knowledge (terms, rules, constraints) from them and reasoning over it to validate service performance. In this paper, we present this Question and Answering (Q&A) system that can be used to analyze and obtain information from the SLA documents. We have created a knowledgebase of Cloud SLAs from various providers which forms the underlying repository of our Q&A system. We utilized techniques from natural language processing and semantic web (RDF, SPARQL and Fuseki server) to build our framework. We also present sample queries on how a consumer can compute metrics such as service credit.


 

Modeling and Extracting information about Cybersecurity Events from Text

Tim Finin, 12:09pm 15 May 2017

Ph.D. Dissertation Proposal

Modeling and Extracting information about Cybersecurity Events from Text

Taneeya Satyapanich

Tuesday, 16 May 2017, ITE 325, UMBC

People rely on the Internet to carry out much of the their daily activities such as banking, ordering food and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data and identity theft. With the large and increasing number of transaction done every day, the frequency of cybercrime events is also increasing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cybersecurity threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation will make two major contributions. The first is to extend our current cyber security ontologies with better models for relevant events, from atomic events like a login attempt, to an extended but related series of events that make up a campaign, to generalized events, such as an increase in denial-of-service attacks originating from a particular region of the world targeted at U.S. financial institutions. The second is the design and implementation of a event extraction system that can extract information about cybersecurity events from text and populated a knowledge graph using our cybersecurity event ontology. We will extend our previous work on event extraction that detected human activity events from news and discussion forums. A new set of features and learning algorithms will be introduced to improve the performance and adapt the system to cybersecurity domain. We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly extract cybersecurity events from text and fill in the event ontology.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates and Karuna Joshi


 

new paper: Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps

Tim Finin, 8:30am 15 May 2017

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane, Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps, AAAI Spring Symposium on AI for Social Good, AAAI Press, March, 2017.

Climate change is an important social issue and the subject of much research, both to understand the history of the Earth’s changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.


 

Fact checking the fact checkers fact check metadata

Tim Finin, 10:26pm 13 May 2017

TL;DR: Some popular fact checking sites are saying that false is true and true is false in their embedded metadata 

I’m a fan of the schema.org claimReview tags for rendering fact checking results as metadata markup embedded in the html that can be easily understood by machines. Google gave a plug for this last Fall and more recently announced that it has broadened its use of the fact checking metadata tags.  It’s a great idea and could help limit the spread of false information on the Web.  But its adoption still has some problems.

Last week I checked to see if the Washington Post is using schema.org’s ClaimReview in their Fact Checker pieces. They are (that’s great!) but WaPo seems to have misunderstood the semantics of the markup by reversing the reviewRating scale, with the result that it assets the opposite of its findings.  For an example, look at this Fact Checker article reviewing claims made by HHS Secretary Tom Price on the AHCA which WaPo rates as being very false, but gives it a high reviewRating of 5 on their scale from 1 to 6.  According to the schema.org specification, this means it’s mostly true, rather than false. ??

WaPo’s Fact Check article ratings assign a checkmark for a claim they find true and from one to four ‘pinocchios‘ for claims they find to be partially (one) or totally (four) false. They also give no rating for claims they find unclear and a ‘flip-flop‘ rating for claims on which a person has been inconsistent. Their reviewRating metadata specifies a worstRating of 1 and a bestRating of 6. They apparently map a checkmark to 1 and ‘four pinocchios‘ to 5. That is, their mapping is {-1:’unclear’; 1:’check mark’, 2:’1 pinocchio’, …, 5:’4 pinocchios’, 6:’flip flop’}. It’s clear from the schema.org ClaimReview examples that that a higher rating number is better and it’s implicit that it is better for a claim to be true.  So I assume that the WaPo FactCheck should reverse its scale, with ‘flip-flop‘ getting a 1, ‘four pinocchios‘ mapped to a 2 and a checkmark assigned a 6.

WaPo is not the only fact checking site that has got this reversed. Aaron Bradley pointed out early in April that Politifact had it’s scale reversed also. I checked last week and confirmed that this was still the case, as this example shows. I sampled a number of Snope’s ClaimCheck ratings and found that all of them were -1 on a scale of -1..+1, as in this example.

It’s clear how this mistake can happen.  Many fact checking sites are motivated by identifying false facts, so have native scales that go from the mundane true statement to the brazen and outrageous completely false.  So a mistake of directly mapping this linear scale into the numeric one from low to high is not completely surprising.

While the fact checking sites that have made this mistake are run by dedicated and careful investigators, the same care has not yet been applied in implementing the semantic metadata embedded in their pages on for their sites.


 

New paper: A Question and Answering System for Management of Cloud Service Level Agreements

Tim Finin, 12:56pm 13 May 2017

Sudip Mittal, Aditi Gupta, Karuna Pande Joshi, Claudia Pearce and Anupam Joshi, A Question and Answering System for Management of Cloud Service Level Agreements,  IEEE International Conference on Cloud Computing, June 2017.

One of the key challenges faced by consumers is to efficiently manage and monitor the quality of cloud services. To manage service performance, consumers have to validate rules embedded in cloud legal contracts, such as Service Level Agreements (SLA) and Privacy Policies, that are available as text documents. Currently this analysis requires significant time and manual labor and is thus inefficient. We propose a cognitive assistant that can be used to manage cloud legal documents by automatically extracting knowledge (terms, rules, constraints) from them and reasoning over it to validate service performance. In this paper, we present this Question and Answering (Q&A) system that can be used to analyze and obtain information from the SLA documents. We have created a knowledgebase of Cloud SLAs from various providers which forms the underlying repository of our Q&A system. We utilized techniques from natural language processing and semantic web (RDF, SPARQL and Fuseki server) to build our framework. We also present sample queries on how a consumer can compute metrics such as service credit.


 

Google search now includes schema.org fact check data

Tim Finin, 9:39am 8 April 2017

Google claims on their search blog that “Fact Check now available in Google Search and News”.  We’ve sampled searches on Google and found that some results did indeed include Fact Check data from schema.org’s ClaimReview markup.  So we are including the following markup on this page.

    
    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "ClaimReview",
      "datePublished": "2016-04-08",
      "url": "http://ebiquity.umbc.edu/blogger/2017/04/08/google-search-now-
              including-schema-org-fact-check-data",
      "itemReviewed":
      {
        "@type": "CreativeWork",
        "author":
        {
          "@type": "Organization",
          "name": "Google"
        },
        "datePublished": "2016-04-07"
      },
      "claimReviewed": "Fact Check now available in Google search and news",
      "author":
      {
        "@type": "Organization",
        "Name": "UMBC Ebiquity Research Group",
        "url": "http://ebiquity.umbc.edu/"
      },
      "reviewRating":
      {
        "@type": "Rating",
        "ratingValue": "5",
        "bestRating": "5",
        "worstRating": "1",
        "alternateName" : "True"
      }
    }</script>

Google notes that

“Only publishers that are algorithmically determined to be an authoritative source of information will qualify for inclusion. Finally, the content must adhere to the general policies that apply to all structured data markup, the Google News Publisher criteria for fact checks, and the standards for accountability and transparency, readability or proper site representation as articulated in our Google News General Guidelines. If a publisher or fact check claim does not meet these standards or honor these policies, we may, at our discretion, ignore that site’s markup.”

and we hope that the algorithms will find us to be an authoritative source of information.

You can see the actual markup by viewing this page’s source or looking at the markup that Google’s structured data testing tool finds on it here by clicking on ClaimReview in the column on the right.

Update: We’ve been algorithmically determined to be an authoritative source of information!


 

A hands-on introduction to TensorFlow and machine learning, 10am 3/28

Tim Finin, 8:34am 18 March 2017

 

A Hands-on Introduction to TensorFlow and Machine Learning

Abhay Kashyap, UMBC ebiquity Lab

10:00-11:00am Tuesday, 28 March 2017, ITE346 ITE325b

As many of you know, TensorFlow is an open source machine learning library by Google which simplifies building and training deep neural networks that can take advantage of computers with GPUs. In this meeting, I will introduce some basic concepts of TensorFlow and machine learning in general. This will be a hands on tutorial where we will sit and code up some basic examples in TensorfFow. Specifically, we will use TensorFlow to implement linear regression, softmax classifiers and feed forward neural networks (MLP). You can find the Python notebooks here. If time permits, we will go over the implementation of the popular word2vec algorithm and introduce LSTMs to build language models.

What you need to know: Python and the basics of linear algebra and matrix operations. While it helps to know basics of machine learning, no prior knowledge will be assumed and there will be a gentle high level introduction to the algorithms we will implement.

What you need to bring: A laptop that has Python and pip installed. Having virtual environments set up on your computer is also a plus. (Warning: Windows-only users might be publicly shamed)


 

SemTk: The Semantics Toolkit from GE Global Research, 4/4

Tim Finin, 8:00am 17 March 2017

The Semantics Toolkit

Paul Cuddihy and Justin McHugh
GE Global Research Center, Niskayuna, NY

10:00-11:00 Tuesday, 4 April 2017, ITE 346, UMBC

SemTk (Semantics Toolkit) is an open source technology stack built by GE Scientists on top of W3C Semantic Web standards.  It was originally conceived for data exploration and simplified query generation, and later expanded to a more general semantics abstraction platform. SemTk is made up of a Java API and microservices along with Javascript front ends that cover drag-and-drop query generation, path finding, data ingestion and the beginnings of stored procedure support.   In this talk we will give a tour of SemTk, discussing its architecture and direction, and demonstrate it’s features using the SPARQLGraph front-end hosted at http://semtk.research.ge.com.

Paul Cuddihy is a senior computer scientist and software systems architect in AI and Learning Systems at the GE Global Research Center in Niskayuna, NY. He earned an M.S. in Computer Science from Rochester Institute of Technology. The focus of his twenty-year career at GE Research has ranged from machine learning for medical imaging equipment diagnostics, monitoring and diagnostic techniques for commercial aircraft engines, modeling techniques for monitoring seniors living independently in their own homes, to parallel execution of simulation and prediction tasks, and big data ontologies.  He is one of the creators of the open source software “Semantics Toolkit” (SemTk) which provides a simplified interface to the semantic tech stack, opening its use to a broader set of users by providing features such as drag-and-drop query generation and data ingestion.  Paul has holds over twenty U.S. patents.

Justin McHugh is computer scientist and software systems architect working in the AI and Learning Systems group at GE Global Research in Niskayuna, NY. Justin attended the State University of New York at Albany where he earned an M.S in computer science. He has worked as a systems architect and programmer for large scale reporting, before moving into the research sector. In the six years since, he has worked on complex system integration, Big Data systems and knowledge representation/querying systems. Justin is one of the architects and creators of SemTK (the Semantics Toolkit), a toolkit aimed at making the power of the semantic web stack available to programmers, automation and subject matter experts without their having to be deeply invested in the workings of the Semantic Web.


 

new paper: App behavioral analysis using system calls

Tim Finin, 10:40am 14 March 2017

Prajit Kumar Das, Anupam Joshi and Tim Finin, App behavioral analysis using system calls, MobiSec: Security, Privacy, and Digital Forensics of Mobile Systems and Networks, IEEE Conference on Computer Communications Workshops, May 2017.

System calls provide an interface to the services made available by an operating system. As a result, any functionality provided by a software application eventually reduces to a set of fixed system calls. Since system calls have been used in literature, to analyze program behavior we made an assumption that analyzing the patterns in calls made by a mobile application would provide us insight into its behavior. In this paper, we present our preliminary study conducted with 534 mobile applications and the system calls made by them. Due to a rising trend of mobile applications providing multiple functionalities, our study concluded, mapping system calls to functional behavior of a mobile application was not straightforward. We use Weka tool and manually annotated application behavior classes and system call features in our experiments to show that using such features achieves mediocre F1-measure at best, for app behavior classification. Thus leading to the conclusion that system calls were not sufficient features for app behavior classification.


 

SADL: Semantic Application Design Language

Tim Finin, 9:19am 4 March 2017

SADL – Semantic Application Design Language

Dr. Andrew W. Crapo
GE Global Research

 10:00 Tuesday, 7 March 2017

The Web Ontology Language (OWL) has gained considerable acceptance over the past decade. Building on prior work in Description Logics, OWL has sufficient expressivity to be useful in many modeling applications. However, its various serializations do not seem intuitive to subject matter experts in many domains of interest to GE. Consequently, we have developed a controlled-English language and development environment that attempts to make OWL plus rules more accessible to those with knowledge to share but limited interest in studying formal representations. The result is the Semantic Application Design Language (SADL). This talk will review the foundational underpinnings of OWL and introduce the SADL constructs meant to capture, validate, and maintain semantic models over their lifecycle.

 

Dr. Crapo has been part of GE’s Global Research staff for over 35 years. As an Information Scientist he has built performance and diagnostic models of mechanical, chemical, and electrical systems, and has specialized in human-computer interfaces, decision support systems, machine reasoning and learning, and semantic representation and modeling. His work has included a graphical expert system language (GEN-X), a graphical environment for procedural programming (Fuselet Development Environment), and a semantic-model-driven user-interface for decision support systems (ACUITy). Most recently Andy has been active in developing the Semantic Application Design Language (SADL), enabling GE to leverage worldwide advances and emerging standards in semantic technology and bring them to bear on diverse problems from equipment maintenance optimization to information security.


 

Context-Dependent Privacy and Security Management on Mobile Devices

Tim Finin, 11:05pm 27 February 2017

Mobile devices and provide better services if then can model, recognize and adapt to their users' context.

Context-Dependent Privacy and Security Management on Mobile Devices

Prajit Das, UMBC

10:00am Tuesday, 27 February, 2017

Security and privacy of mobile devices is a challenging research domain. A prominent aspect of this research focuses on discovering software vulnerabilities for mobile operating systems and mobile apps. The other aspect of research focuses on user privacy and using feedback, generates privacy profiles for controlling data privacy. Profile based or role-based security can be restrictive as they require prior definition of such roles or profiles. As a result, it is better to use attribute-based access control and let the attributes define granularity of policy definition. This problem may thus be defined as, a security and privacy personalization problem. A critical issue in the process of capturing personalized policy is one of creating a system that is adaptive and knows when user’s preferences have been captured. Presented in this work you will learn about Mithril, a framework for capturing user access control policies that are fine-grained, context-sensitive and are represented using Semantic Web technologies and thereby manages access control decisions for user data on mobile devices. Violation metric has been used in this work as a measure to determine system state. A hierarchical context ontology has been used to define fine-grained access control policies and simplifying the process of policy modification for a user. A secondary goal of this research was to determine behavioral traits of mobile applications with a goal to detect outlier applications. Some preliminary research on this topic will also be discussed.


 

Large Scale Cross Domain Temporal Topic Modeling for Climate Change Research

Tim Finin, 8:43am 23 December 2016

Jennifer Sleeman, Milton Halem, Tim Finin, Mark Cane, Advanced Large Scale Cross Domain Temporal Topic Modeling Algorithms to Infer the Influence of Recent Research on IPCC Assessment Reports (poster), American Geophysical Union Fall Meeting 2016, American Geophysical Union, December 2016.

One way of understanding the evolution of science within a particular scientific discipline is by studying the temporal influences that research publications had on that discipline. We provide a methodology for conducting such an analysis by employing cross-domain topic modeling and local cluster mappings of those publications with the historical texts to understand exactly when and how they influenced the discipline. We apply our method to the Intergovernmental Panel on Climate Change (IPCC) Assessment Reports and the citations therein. The IPCC reports were compiled by thousands of Earth scientists and the assessments were issued approximately every five years over a 30 year span, and includes over 200,000 research papers cited by these scientists.