UMBC ebiquity
Semantic Web

Archive for the 'Semantic Web' Category

Assessing credibility of content on Twitter using automated techniques

November 29th, 2015, by Tim Finin, posted in Machine Learning, Semantic Web, Social media, Web

Aditi Gupta

10:30am, Monday 30 November 2015, ITE 346

Online social media is a powerful platform for dissemination of information during real world events. Beyond the challenges of volume, variety and velocity of content generated on online social media, veracity poses a much greater challenge for effective utilization of this content by citizens, organizations, and authorities. Veracity of information refers to the trustworthiness /credibility / accuracy / completeness of the content. This work addressed the challenge of veracity or trustworthiness of content posted on social media.  We focus our work on Twitter, which is one of the most popular microblogging web service today. We provided an in-depth analysis of misinformation spread on Twitter during real world events. We showed effectiveness of automated techniques to detect misinformation on Twitter using a combination of content, meta-data, network, user profile and temporal features. We developed and deployed a novel framework, TweetCred for providing indication of trustworthiness / credibility of tweets posted during events. TweetCred, which was available as a browser plug-in, was installed and used by real Twitter users.

Dr. Aditi Gupta is a research associate in the Computer Science and Electrical Engineering Department at UMBC.  She received her Ph.D. from the Indraprastha Institute of Information Technology, Delhi  (IIIT-Delhi) in 2105 for her dissertation on designing and evaluating techniques to mitigate misinformation spread on microblogging web services.

Semantic Interpretation of Structured Log Files

November 21st, 2015, by Tim Finin, posted in Machine Learning, Semantic Web


Piyush Nimbalkar, Semantic Interpretation of Structured Log Files, M.S. thesis, University of Maryland, Baltimore County, August, 2015.

Log files comprise a record of different events happening in various applications, operating systems and even in network devices. Originally they were used to record information for diagnostic and debugging purposes. Nowadays, logs are also used to track events which can be used in auditing and forensics in case of malicious activities or systems attacks. Various softwares like intrusion detection systems, web servers, anti-virus and anti-malware systems, firewalls and network devices generate logs with useful information, that can be used to protect against such system attacks. Analyzing log files can help in pro- actively avoiding attacks against the systems. While there are existing tools that do a good job when the format of log files is known, the challenge lies in cases where log files are from unknown devices and of unknown formats. We propose a framework that takes any log file and automatically gives out a semantic interpretation as a set of RDF Linked Data triples. The framework splits a log file into columns using regular expression-based or dictionary-based classifiers. Leveraging and modifying our existing work on inferring the semantics of tables, we identify every column from a log file and map it to concepts either from a general purpose KB like DBpedia or domain specific ontologies such as IDS. We also identify relationships between various columns in such log files. Converting large and verbose log files into such semantic representations will help in better search, integration and rich reasoning over the data.

Supporting Situationally Aware Cybersecurity Systems

November 8th, 2015, by Tim Finin, posted in cybersecurity, Ontologies, Semantic Web

Zareen Syed, Tim Finin, Ankur Padia and M. Lisa Mathews, Supporting Situationally Aware Cybersecurity Systems, Technical Report, Computer Science and Electrical Engineering, UMBC, 30 September 2015.

In this report, we describe the Unified Cyber Security ontology (UCO) to support situational awareness in cyber security systems. The ontology is an effort to incorporate and integrate heterogeneous information available from different cyber security systems and most commonly used cyber security standards for information sharing and exchange. The ontology has also been mapped to a number of existing cyber security ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for Linked Open Data cloud, we envision UCO to serve as the core for the specialized cyber security Linked Open Data cloud which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use-cases supported by the UCO ontology. To the best of our knowledge, this is the first cyber security ontology that has been mapped to general world ontologies to support broader and diverse security use-cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.

Extracting Structured Summaries from Text Documents

November 5th, 2015, by Tim Finin, posted in NLP, Ontologies, Semantic Web

Extracting Structured Summaries
from Text Documents

Dr. Zareen Syed
Research Assistant Professor, UMBC

10:30am, Monday, 9 November 2015, ITE 346, UMBC

In this talk, Dr. Syed will present unsupervised approaches for automatically extracting structured summaries composed of slots and fillers (attributes and values) and important facts from articles, thus effectively reducing the amount of time and effort spent on gathering intelligence by humans using traditional keyword based search approaches. The approach first extracts important concepts from text documents and links them to unique concepts in Wikitology knowledge base. It then exploits the types associated with the linked concepts to discover candidate slots and fillers. Finally it applies specialized approaches for ranking and filtering slots to select the most relevant slots to include in the structured summary.

Compared with the state of the art, Dr. Syed’s approach is unrestricted, i.e., it does not require manually crafted catalogue of slots or relations of interest that may vary over different domains. Unlike Natural Language Processing (NLP) based approaches that require well-formed sentences, the approach can be applied on semi-structured text. Furthermore, NLP based approaches for fact extraction extract lexical facts and sentences that require further processing for disambiguating and linking to unique entities and concepts in a knowledge base, whereas, in Dr. Syed’s approach, concept linking is done as a first step in the discovery process. Linking concepts to a knowledge base provides the additional advantage that the terms can be explicitly linked or mapped to semantic concepts in other ontologies and are thus available for reasoning in more sophisticated language understanding systems.

The KELVIN Information Extraction System

October 30th, 2015, by Tim Finin, posted in NLP, NLP, Semantic Web

In this week’s ebiquity lab meeting (10:30am Monday Nov 2), Tim Finin will describe recent work on the Kelvin information extraction system and its performance in two tasks in the 2015 NIST Text Analysis Conference. Kelvin has been under development at the JHU Human Language Center of Excellence for several years. Kelvin reads documents in several languages and extracts entities and relations between them. This year it was used for the Coldstart Knowledge Base Population and Trilingual Entity Discovery and Linking tasks. Key components in the tasks are a system for cross-document coreference and another that links entities to entries in the Freebase knowledge base.

Lyrics Augmented Multi-modal Music Recommendation

October 29th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web

Lyrics Augmented Multi-modal
Music Recommendation

Abhay Kashyap

1:00pm Friday 30 October, ITE 325b

In an increasingly mobile and connected world, digital music consumption has rapidly increased. More recently, faster and cheaper mobile bandwidth has given the average mobile user the potential to access large troves of music through streaming services like Spotify and Google Music that boast catalogs with tens of millions of songs. At this scale, effective music recommendation is critical for music discovery and personalized user experience.

Recommenders that rely on collaborative information suffer from two major problems: the long tail problem, which is induced by popularity bias, and the cold start problem caused by new items with no data. In such cases, they fall back on content to compute similarity. For music, content based features can be divided into acoustic and textual domains. Acoustic features are extracted from the audio signal while textual features come from song metadata, lyrical content, collaborative tags and associated web text.

Research in content based music similarity has largely been focused in the acoustic domain while text based features have been limited to metadata, tags and shallow methods for web text and lyrics. Song lyrics house information about the sentiment and topic of a song that cannot be easily extracted from the audio. Past work has shown that even shallow lyrical features improved audio-only features and in some tasks like mood classification, outperformed audio-only features. In addition, lyrics are also easily available which make them a valuable resource and warrant a deeper analysis.

The goal of this research is to fill the lyrical gap in existing music recommender systems. The first step is to build algorithms to extract and represent the meaning and emotion contained in the song’s lyrics. The next step is to effectively combine lyrical features with acoustic and collaborative information to build a multi-modal recommendation engine.

For this work, the genre is restricted to Rap because it is a lyrics-centric genre and techniques built for Rap can be generalized to other genres. It was also the highest streamed genre in 2014, accounting for 28.5% of all music streamed. Rap lyrics are scraped from dedicated lyrics websites like and while the semantic knowledge base comprising artists, albums and song metadata come from the MusicBrainz project. Acoustic features are directly used from EchoNest while collaborative information like tags, plays, co-plays etc. come from

Preliminary work involved extraction of compositional style features like rhyme patterns and density, vocabulary size, simile and profanity usage from over 10,000 songs by over 150 artists. These features are available for users to browse and explore through interactive visualizations on Song semantics were represented using off-the-shelf neural language based vector models (doc2vec). Future work will involve building novel language models for lyrics and latent representations for attributes that is driven by collaborative information for multi-modal recommendation.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Pranam Kolari (WalmartLabs), Cynthia Matuszek and Tim Oates

Beyond NER: Towards Semantics in Clinical Text

September 29th, 2015, by Tim Finin, posted in NLP, Ontologies, RDF, Semantic Web

Clare Grasso, Anupam Joshi and ELior Siegel, Beyond NER: Towards Semantics in Clinical Text, Biomedical Data Mining, Modeling, and Semantic Integration (BDM2I); co-located with the 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA.

While clinical text NLP systems have become very effective in recognizing named entities in clinical text and mapping them to standardized terminologies in the normalization process, there remains a gap in the ability of extractors to combine entities together into a complete semantic representation of medical concepts that contain multiple attributes each of which has its own set of allowed named entities or values. Furthermore, additional domain knowledge may be required to determine the semantics of particular tokens in the text that take on special meanings in relation to this concept. This research proposes an approach that provides ontological mappings of the surface forms of medical concepts that are of the UMLS semantic class signs/symptoms. The mappings are used to extract and encode the constituent set of named entities into interoperable semantic structures that can be linked to other structured and unstructured data for reuse in research and analysis.

talk: Attribute-based Fine Grained Access Control for Triple Stores

September 12th, 2015, by Tim Finin, posted in Security, Semantic Web


In the 14-09-2015 ebiquity meeting, Ankur Padia will talk about his recent work aimed at providing access control for an RDF triple store.

Attribute-based Fine Grained Access Control for Triple Stores

Ankur Padia, UMBC

The maturation of semantic web standards and associated web-based data representations like have made RDF a popular model for representing graph data and semi-structured knowledge. However, most existing SPARQL endpoint supports simple access control mechanism preventing its use for many applications. To protect the data stored in RDF stores, we describe a framework to support attribute-based fine grained access control and explore its feasibility. We implemented a prototype of the system and used it to carry out an initial analysis on the relation between access control policies, query execution time, and size of the RDF dataset.

For more information, see: Ankur Padia Tim Finin and Anupam Joshi, Attribute-based Fine Grained Access Control for Triple Stores, 3rd Society, Privacy and the Semantic Web – Policy and Technology workshop (PrivOn 2015), 14th Int. Semantic Web Conf., Oct. 2015.

UMBC Schema Free Query system on ESWC Schema-agnostic Queries over Linked Data

June 7th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web

This year’s ESWC Semantic Web Evaluation Challenge track had a task on Schema-agnostic Queries over Linked Data: SAQ-2015. The idea is to support a SPARQL-like query language that does not require knowing the underlying graph schema nor the URIs to use for terms and individuals, as in the follwing examples.

 SELECT ?y {BillClinton hasDaughter ?x. ?x marriedTo ?y.}

 SELECT ?x {?x isA book. ?x by William_Goldman.
            ?x has_pages ?p. FILTER (?p > 300)}

We adapted our Schema Free Querying system to the task as described in the following paper.

Zareen Syed, Lushan Han, Muhammad Mahbubur Rahman, Tim Finin, James Kukla and Jeehye Yun, UMBC_Ebiquity-SFQ: Schema Free Querying System, ESWC Semantic Web Evaluation Challenge, Extended Semantic Web Conference, June 2015.

Users need better ways to explore large complex linked data resources. Using SPARQL requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology and URIs for entities of interest. Natural language question answering systems solve the problem, but these are still subjects of research. The Schema agnostic SPARQL queries task defined in SAQ-2015 challenge consists of schema-agnostic queries following the syntax of the SPARQL standard, where the syntax and semantics of operators are maintained, while users are free to choose words, phrases and entity names irrespective of the underlying schema or ontology. This combination of query skeleton with keywords helps to remove some of the ambiguity. We describe our framework for handling schema agnostic or schema free queries and discuss enhancements to handle the SAQ-2015 challenge queries. The key contributions are the robust methods that combine statistical association and semantic similarity to map user terms to the most appropriate classes and properties used in the underlying ontology and type inference for user input concepts based on concept linking.

Querying RDF Data with Text Annotated Graphs

June 6th, 2015, by Tim Finin, posted in Big data, Database, Machine Learning, RDF, Semantic Web

New paper: Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng, Querying RDF Data with Text Annotated Graphs, 27th International Conference on Scientific and Statistical Database Management, San Diego, June 2015.

Scientists and casual users need better ways to query RDF databases or Linked Open Data. Using the SPARQL query language requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology used, and URIs for entities of interest. Natural language query systems are a powerful approach, but current techniques are brittle in addressing the ambiguity and complexity of natural language and require expensive labor to supply the extensive domain knowledge they need. We introduce a compromise in which users give a graphical “skeleton” for a query and annotates it with freely chosen words, phrases and entity names. We describe a framework for interpreting these “schema-agnostic queries” over open domain RDF data that automatically translates them to SPARQL queries. The framework uses semantic textual similarity to find mapping candidates and uses statistical approaches to learn domain knowledge for disambiguation, thus avoiding expensive human efforts required by natural language interface systems. We demonstrate the feasibility of the approach with an implementation that performs well in an evaluation on DBpedia data.

Discovering and Querying Hybrid Linked Data

June 5th, 2015, by Tim Finin, posted in Big data, KR, Machine Learning, Semantic Web


New paper: Zareen Syed, Tim Finin, Muhammad Rahman, James Kukla and Jeehye Yun, Discovering and Querying Hybrid Linked Data, Third Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, held in conjunction with the 12th Extended Semantic Web Conference, Portoroz Slovenia, June 2015.

In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and unstructured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.

talk: Amit Sheth on Transforming Big data into Smart Data, 11a Tue 5/26

May 17th, 2015, by Tim Finin, posted in Big data, Semantic Web

Transforming big data into smart data:
deriving value via harnessing volume, variety
and velocity using semantics and semantic web

Professor Amit Sheth
Wright State University

11:00am Tuesday, 26 May 2015, ITE 325, UMBC

Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, "How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?" As I will show, Smart Data that gives such personalized and actionable information will need to utilize multimodal data and their metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on Machine Learning and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models. I will present a couple of Smart Data applications in development at Kno.e.sis from the domains of personalized health, health informatics, social data for social good, energy, disaster response, and smart city.

Amit Sheth is an Educator, Researcher and Entrepreneur. He is the LexisNexis Ohio Eminent Scholar, an IEEE Fellow, and the executive director of Kno.e.sis – the Ohio Center of Excellence in Knowledge-enabled Computing a Wright State University. In World Wide Web (WWW), it is placed among the top ten universities in the world based on 10-year impact. Prof. Sheth is a well cited computer scientists (h-index = 87, >30,000 citations), and appears among top 1-3 authors in World Wide Web (Microsoft Academic Search). He has founded two companies, and several commercial products and deployed systems have resulted from his research. His students are exceptionally successful; ten out of 18 past PhD students have 1,000+ citations each.

Host: Yelena Yesha,

You are currently browsing the archives for the Semantic Web category.

  Home | Archive | Login | Feed