UMBC ebiquity
UMBC eBiquity Blog

Google search now includes schema.org fact check data

Tim Finin, 9:39am 8 April 2017

Google claims on their search blog that “Fact Check now available in Google Search and News”.  We’ve sampled searches on Google and found that some results did indeed include Fact Check data from schema.org’s ClaimReview markup.  So we are including the following markup on this page.

    
    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "ClaimReview",
      "datePublished": "2016-04-08",
      "url": "http://ebiquity.umbc.edu/blogger/2017/04/08/google-search-now-
              including-schema-org-fact-check-data",
      "itemReviewed":
      {
        "@type": "CreativeWork",
        "author":
        {
          "@type": "Organization",
          "name": "Google"
        },
        "datePublished": "2016-04-07"
      },
      "claimReviewed": "Fact Check now available in Google search and news",
      "author":
      {
        "@type": "Organization",
        "Name": "UMBC Ebiquity Research Group",
        "url": "http://ebiquity.umbc.edu/"
      },
      "reviewRating":
      {
        "@type": "Rating",
        "ratingValue": "5",
        "bestRating": "5",
        "worstRating": "1",
        "alternateName" : "True"
      }
    }</script>

Google notes that

“Only publishers that are algorithmically determined to be an authoritative source of information will qualify for inclusion. Finally, the content must adhere to the general policies that apply to all structured data markup, the Google News Publisher criteria for fact checks, and the standards for accountability and transparency, readability or proper site representation as articulated in our Google News General Guidelines. If a publisher or fact check claim does not meet these standards or honor these policies, we may, at our discretion, ignore that site’s markup.”

and we hope that the algorithms will find us to be an authoritative source of information.

You can see the actual markup by viewing this page’s source or looking at the markup that Google’s structured data testing tool finds on it here by clicking on ClaimReview in the column on the right.

Update: We’ve been algorithmically determined to be an authoritative source of information!


 

A hands-on introduction to TensorFlow and machine learning, 10am 3/28

Tim Finin, 8:34am 18 March 2017

 

A Hands-on Introduction to TensorFlow and Machine Learning

Abhay Kashyap, UMBC ebiquity Lab

10:00-11:00am Tuesday, 28 March 2017, ITE346 ITE325b

As many of you know, TensorFlow is an open source machine learning library by Google which simplifies building and training deep neural networks that can take advantage of computers with GPUs. In this meeting, I will introduce some basic concepts of TensorFlow and machine learning in general. This will be a hands on tutorial where we will sit and code up some basic examples in TensorfFow. Specifically, we will use TensorFlow to implement linear regression, softmax classifiers and feed forward neural networks (MLP). You can find the Python notebooks here. If time permits, we will go over the implementation of the popular word2vec algorithm and introduce LSTMs to build language models.

What you need to know: Python and the basics of linear algebra and matrix operations. While it helps to know basics of machine learning, no prior knowledge will be assumed and there will be a gentle high level introduction to the algorithms we will implement.

What you need to bring: A laptop that has Python and pip installed. Having virtual environments set up on your computer is also a plus. (Warning: Windows-only users might be publicly shamed)


 

SemTk: The Semantics Toolkit from GE Global Research, 4/4

Tim Finin, 8:00am 17 March 2017

The Semantics Toolkit

Paul Cuddihy and Justin McHugh
GE Global Research Center, Niskayuna, NY

10:00-11:00 Tuesday, 4 April 2017, ITE 346, UMBC

SemTk (Semantics Toolkit) is an open source technology stack built by GE Scientists on top of W3C Semantic Web standards.  It was originally conceived for data exploration and simplified query generation, and later expanded to a more general semantics abstraction platform. SemTk is made up of a Java API and microservices along with Javascript front ends that cover drag-and-drop query generation, path finding, data ingestion and the beginnings of stored procedure support.   In this talk we will give a tour of SemTk, discussing its architecture and direction, and demonstrate it’s features using the SPARQLGraph front-end hosted at http://semtk.research.ge.com.

Paul Cuddihy is a senior computer scientist and software systems architect in AI and Learning Systems at the GE Global Research Center in Niskayuna, NY. He earned an M.S. in Computer Science from Rochester Institute of Technology. The focus of his twenty-year career at GE Research has ranged from machine learning for medical imaging equipment diagnostics, monitoring and diagnostic techniques for commercial aircraft engines, modeling techniques for monitoring seniors living independently in their own homes, to parallel execution of simulation and prediction tasks, and big data ontologies.  He is one of the creators of the open source software “Semantics Toolkit” (SemTk) which provides a simplified interface to the semantic tech stack, opening its use to a broader set of users by providing features such as drag-and-drop query generation and data ingestion.  Paul has holds over twenty U.S. patents.

Justin McHugh is computer scientist and software systems architect working in the AI and Learning Systems group at GE Global Research in Niskayuna, NY. Justin attended the State University of New York at Albany where he earned an M.S in computer science. He has worked as a systems architect and programmer for large scale reporting, before moving into the research sector. In the six years since, he has worked on complex system integration, Big Data systems and knowledge representation/querying systems. Justin is one of the architects and creators of SemTK (the Semantics Toolkit), a toolkit aimed at making the power of the semantic web stack available to programmers, automation and subject matter experts without their having to be deeply invested in the workings of the Semantic Web.


 

new paper: App behavioral analysis using system calls

Tim Finin, 10:40am 14 March 2017

Prajit Kumar Das, Anupam Joshi and Tim Finin, App behavioral analysis using system calls, MobiSec: Security, Privacy, and Digital Forensics of Mobile Systems and Networks, IEEE Conference on Computer Communications Workshops, May 2017.

System calls provide an interface to the services made available by an operating system. As a result, any functionality provided by a software application eventually reduces to a set of fixed system calls. Since system calls have been used in literature, to analyze program behavior we made an assumption that analyzing the patterns in calls made by a mobile application would provide us insight into its behavior. In this paper, we present our preliminary study conducted with 534 mobile applications and the system calls made by them. Due to a rising trend of mobile applications providing multiple functionalities, our study concluded, mapping system calls to functional behavior of a mobile application was not straightforward. We use Weka tool and manually annotated application behavior classes and system call features in our experiments to show that using such features achieves mediocre F1-measure at best, for app behavior classification. Thus leading to the conclusion that system calls were not sufficient features for app behavior classification.


 

SADL: Semantic Application Design Language

Tim Finin, 9:19am 4 March 2017

SADL – Semantic Application Design Language

Dr. Andrew W. Crapo
GE Global Research

 10:00 Tuesday, 7 March 2017

The Web Ontology Language (OWL) has gained considerable acceptance over the past decade. Building on prior work in Description Logics, OWL has sufficient expressivity to be useful in many modeling applications. However, its various serializations do not seem intuitive to subject matter experts in many domains of interest to GE. Consequently, we have developed a controlled-English language and development environment that attempts to make OWL plus rules more accessible to those with knowledge to share but limited interest in studying formal representations. The result is the Semantic Application Design Language (SADL). This talk will review the foundational underpinnings of OWL and introduce the SADL constructs meant to capture, validate, and maintain semantic models over their lifecycle.

 

Dr. Crapo has been part of GE’s Global Research staff for over 35 years. As an Information Scientist he has built performance and diagnostic models of mechanical, chemical, and electrical systems, and has specialized in human-computer interfaces, decision support systems, machine reasoning and learning, and semantic representation and modeling. His work has included a graphical expert system language (GEN-X), a graphical environment for procedural programming (Fuselet Development Environment), and a semantic-model-driven user-interface for decision support systems (ACUITy). Most recently Andy has been active in developing the Semantic Application Design Language (SADL), enabling GE to leverage worldwide advances and emerging standards in semantic technology and bring them to bear on diverse problems from equipment maintenance optimization to information security.


 

Context-Dependent Privacy and Security Management on Mobile Devices

Tim Finin, 11:05pm 27 February 2017

Mobile devices and provide better services if then can model, recognize and adapt to their users' context.

Context-Dependent Privacy and Security Management on Mobile Devices

Prajit Das, UMBC

10:00am Tuesday, 27 February, 2017

Security and privacy of mobile devices is a challenging research domain. A prominent aspect of this research focuses on discovering software vulnerabilities for mobile operating systems and mobile apps. The other aspect of research focuses on user privacy and using feedback, generates privacy profiles for controlling data privacy. Profile based or role-based security can be restrictive as they require prior definition of such roles or profiles. As a result, it is better to use attribute-based access control and let the attributes define granularity of policy definition. This problem may thus be defined as, a security and privacy personalization problem. A critical issue in the process of capturing personalized policy is one of creating a system that is adaptive and knows when user’s preferences have been captured. Presented in this work you will learn about Mithril, a framework for capturing user access control policies that are fine-grained, context-sensitive and are represented using Semantic Web technologies and thereby manages access control decisions for user data on mobile devices. Violation metric has been used in this work as a measure to determine system state. A hierarchical context ontology has been used to define fine-grained access control policies and simplifying the process of policy modification for a user. A secondary goal of this research was to determine behavioral traits of mobile applications with a goal to detect outlier applications. Some preliminary research on this topic will also be discussed.


 

Large Scale Cross Domain Temporal Topic Modeling for Climate Change Research

Tim Finin, 8:43am 23 December 2016

Jennifer Sleeman, Milton Halem, Tim Finin, Mark Cane, Advanced Large Scale Cross Domain Temporal Topic Modeling Algorithms to Infer the Influence of Recent Research on IPCC Assessment Reports (poster), American Geophysical Union Fall Meeting 2016, American Geophysical Union, December 2016.

One way of understanding the evolution of science within a particular scientific discipline is by studying the temporal influences that research publications had on that discipline. We provide a methodology for conducting such an analysis by employing cross-domain topic modeling and local cluster mappings of those publications with the historical texts to understand exactly when and how they influenced the discipline. We apply our method to the Intergovernmental Panel on Climate Change (IPCC) Assessment Reports and the citations therein. The IPCC reports were compiled by thousands of Earth scientists and the assessments were issued approximately every five years over a 30 year span, and includes over 200,000 research papers cited by these scientists.


 

PhD Proposal: Understanding the Logical and Semantic Structure of Large Documents

Tim Finin, 8:54am 9 December 2016

business documents

Dissertation Proposal

Understanding the Logical and Semantic
Structure of Large Documents 

Muhammad Mahbubur Rahman

11:00-1:00 Monday, 12 December 2016, ITE325b, UMBC

Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.

We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Matuszek, James Mayfield (JHU)


 

PhD Proposal: Ankur Padia, Dealing with Dubious Facts in Knowledge Graphs

Tim Finin, 9:25pm 29 November 2016

the skeptic

Dissertation Proposal

Dealing with Dubious Facts
in Knowledge Graphs

Ankur Padia

1:00-3:00pm Wednesday, 30 November 2016, ITE 325b, UMBC

Knowledge graphs are structured representations of facts where nodes are real-world entities or events and edges are the associations among the pair of entities. Knowledge graphs can be constructed using automatic or manual techniques. Manual techniques construct high quality knowledge graphs but are expensive, time consuming and not scalable. Hence, automatic information extraction techniques are used to create scalable knowledge graphs but the extracted information can be of poor quality due to the presence of dubious facts.

An extracted fact is dubious if it is incorrect, inexact or correct but lacks evidence. A fact might be dubious because of the errors made by NLP extraction techniques, improper design consideration of the internal components of the system, choice of learning techniques (semi-supervised or unsupervised), relatively poor quality of heuristics or the syntactic complexity of underlying text. A preliminary analysis of several knowledge extraction systems (CMU’s NELL and JHU’s KELVIN) and observations from the literature suggest that dubious facts can be identified, diagnosed and managed. In this dissertation, I will explore approaches to identify and repair such dubious facts from a knowledge graph using several complementary approaches, including linguistic analysis, common sense reasoning, and entity linking.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Paul McNamee (JHU), Partha Talukdar (IISc, India)


 

Understanding Large Documents

Tim Finin, 10:49am 28 November 2016

business documents

In this week’s ebiquity meeting, Muhammad Mahbubur Rahman will about about his work on understanding large documents, such as business RFPs.

Large Document Understanding

Muhammad Mahbubur Rahman

Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.

We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.


 

PhD proposal: Sandeep Nair Narayanan, Cognitive Analytics Framework to Secure Internet of Things

Tim Finin, 12:47pm 26 November 2016

cognitive car

Dissertation Proposal

Cognitive Analytics Framework to Secure Internet of Things

Sandeep Nair Narayanan

1:00-3:30pm, Monday, 28 November 2016, ITE 325b

Recent years have seen the rapid growth and widespread adoption of Internet of Things in a wide range of domains including smart homes, healthcare, automotive, smart farming and smart grids. The IoT ecosystem consists of devices like sensors, actuators and control systems connected over heterogeneous networks. The connected devices can be from different vendors with different capabilities in terms of power requirements, processing capabilities, etc. As such, many security features aren’t implemented on devices with lesser processing capabilities. The level of security practices followed during their development can also be different. Lack of over the air update for firmware also pose a very big security threat considering their long-term deployment requirements. Device malfunctioning is yet another threat which should be considered. Hence, it is imperative to have an external entity which monitors the ecosystem and detect attacks and anomalies.

In this thesis, we propose a security framework for IoTs using cognitive techniques. While anomaly detection has been employed in various domains, some challenges like online approach, resource constraints, heterogeneity, distributed data collection etc. are unique to IoTs and their predecessors like wireless sensor networks. Our framework will have an underlying knowledge base which has the domain-specific information, a hybrid context generation module which generates complex contexts and a fast reasoning engine which does logical reasoning to detect anomalous activities. When raw sensor data arrives, the hybrid context generation module queries the knowledge base and generates different simple local contexts using various statistical and machine learning models. The inferencing engine will then infer global complex contexts and detects anomalous activities using knowledge from streaming facts and and domain specific rules encoded in the Ontology we will create. We will evaluate our techniques by realizing and validating them in the vehicular domain.

Committee: Drs. Dr. Anupam Joshi (Chair), Dr. Tim Finin, Dr. Nilanjan Banerjee, Dr. Yelena Yesha, Dr. Wenjia Li, NYIT, Dr. Filip Perich, Google


 

Dealing with Dubious Facts in Knowledge Graphs

Tim Finin, 10:47am 22 November 2016

In this week’s meeting, Ankur Padia will about about his work on the problem of identifying and managing ‘dubious facts’ extracted from text and added to a knowledge graph.

Dealing with Dubious Facts in Knowledge Graphs

Ankur Padia

Knowledge graphs are used to represent real-world facts and events with entities as nodes and relations as labeled edges. Generally, a knowledge graph is automatically constructed by extracting facts from text corpus using information extraction (IE) techniques. Such IE techniques are scalable but often extract low quality (or dubious) facts due to errors caused by NLP libraries, internal components of an extraction system, choice of learning techniques, heuristics and syntactic complexity of underlying text. We wish to explore techniques to process such dubious facts and improve the quality of a knowledge graph.