Archive for the 'Semantic Web' Category
April 18th, 2016, by Tim Finin, posted in IoT, Policy, Semantic Web
Prajit Kumar Das, Sandeep Nair, Nitin Kumar Sharma, Anupam Joshi, Karuna Pande Joshi, and Tim Finin, Context-Sensitive Policy Based Security in Internet of Things
, 1st IEEE Workshop on Smart Service Systems
, co-located with IEEE Int. Conf. on Smart Computing, St. Louis, 18 May 2016.
According to recent media reports, there has been a surge in the number of devices that are being connected to the Internet. The Internet of Things (IoT), also referred to as Cyber-Physical Systems, is a collection of physical entities with computational and communication capabilities. The storage and computing power of these devices is often limited and their designs currently focus on ensuring functionality and largely ignore other requirements, including security and privacy concerns. We present the design of a framework that allows IoT devices to capture, represent, reason with, and enforce information sharing policies. We use Semantic Web technologies to represent the policies, the information to be shared or protected, and the IoT device context. We discuss use-cases where our design will help in creating an “intelligent” IoT device and ensuring data security and privacy using context-sensitive information sharing policies.
April 3rd, 2016, by Tim Finin, posted in cybersecurity, Ontologies, OWL, RDF, Security, Semantic Web
Policies For Oblivious Cloud Storage
Using Semantic Web Technologies
10:30am, Monday, 4 April 2016, ITE 346, UMBC
Consumers want to ensure that their enterprise data is stored securely and obliviously on the cloud, such that the data objects or their access patterns are not revealed to anyone, including the cloud provider, in the public cloud environment. We have created a detailed ontology describing the oblivious cloud storage models and role based access controls that should be in place to manage this risk. We have also implemented the ObliviCloudManager application that allows users to manage their cloud data using oblivious data structures. This application uses role based access control model and collection based document management to store and retrieve data efficiently. Cloud consumers can use our system to define policies for storing data obliviously and manage storage on untrusted cloud platforms, even if they are not familiar with the underlying technology and concepts of the oblivious data structure.
February 17th, 2016, by Tim Finin, posted in Ontologies, Security, Semantic Web
Botnet attacks turn susceptible victim computers into bots that perform various malicious activities while under the control of a botmaster. Some examples of the damage they cause include denial of service, click fraud, spamware, and phishing. These attacks can vary in the type of architecture and communication protocol used, which might be modified during the botnet lifespan. Intrusion detection and prevention systems are one way to safeguard the cyber-physical systems we use, but they have difficulty detecting new or modified attacks, including botnets. Only known attacks whose signatures have been identified and stored in some form can be discovered by most of these systems. Also, traditional IDPSs are point-based solutions incapable of utilizing information from multiple data sources and have difficulty discovering new or more complex attacks. To address these issues, we are developing a semantic approach to intrusion detection that uses a variety of sensors collaboratively. Leveraging information from these heterogeneous sources leads to a more robust, situational-aware IDPS that is better equipped to detect complicated attacks such as botnets.
December 16th, 2015, by Tim Finin, posted in cybersecurity, KR, Ontologies, Semantic Web
Zareen Syed, Ankur Padia, Tim Finin, Lisa Mathews and Anupam Joshi, UCO: Unified Cybersecurity Ontology
, AAAI Workshop on Artificial Intelligence for Cyber Security (AICS), February 2016.
In this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integrates heterogeneous data and knowledge schemas from different cybersecurity systems and most commonly used cybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to a number of existing cybersecurity ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO to serve as the core for cybersecurity domain, which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use cases supported by the UCO ontology. To the best of our knowledge, this is the first cybersecurity ontology that has been mapped to general world ontologies to support broader and diverse security use cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.
December 3rd, 2015, by Tim Finin, posted in AI, NLP, NLP, Semantic Web
“Alexa, get my coffee”:
Using the Amazon Echo in Research
10:30am Monday, 7 December 2015, ITE 346
The Amazon Echo is a remarkable example of language-controlled, user-centric technology, but also a great example of how far such devices have to go before they will fulfill the longstanding promise of intelligent assistance. In this talk, we will describe the Interactive Robotics and Language Lab‘s work with the Echo, with an emphasis on the practical aspects of getting it set up for development and adding new capabilities. We will demonstrate adding a simple new interaction, and then lead a brainstorming session on future research applications.
Megan Zimmerman is a UMBC undergrad majoring in computer science working on interpreting language about tasks at varying levels of abstraction, with a focus on interpreting abstract statements as possible task instructions in assistive technology.
November 29th, 2015, by Tim Finin, posted in Machine Learning, Semantic Web, Social media, Web
10:30am, Monday 30 November 2015, ITE 346
Online social media is a powerful platform for dissemination of information during real world events. Beyond the challenges of volume, variety and velocity of content generated on online social media, veracity poses a much greater challenge for effective utilization of this content by citizens, organizations, and authorities. Veracity of information refers to the trustworthiness /credibility / accuracy / completeness of the content. This work addressed the challenge of veracity or trustworthiness of content posted on social media. We focus our work on Twitter, which is one of the most popular microblogging web service today. We provided an in-depth analysis of misinformation spread on Twitter during real world events. We showed effectiveness of automated techniques to detect misinformation on Twitter using a combination of content, meta-data, network, user profile and temporal features. We developed and deployed a novel framework, TweetCred for providing indication of trustworthiness / credibility of tweets posted during events. TweetCred, which was available as a browser plug-in, was installed and used by real Twitter users.
Dr. Aditi Gupta is a research associate in the Computer Science and Electrical Engineering Department at UMBC. She received her Ph.D. from the Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi) in 2105 for her dissertation on designing and evaluating techniques to mitigate misinformation spread on microblogging web services.
November 21st, 2015, by Tim Finin, posted in Machine Learning, Semantic Web
Log files comprise a record of different events happening in various applications, operating systems and even in network devices. Originally they were used to record information for diagnostic and debugging purposes. Nowadays, logs are also used to track events which can be used in auditing and forensics in case of malicious activities or systems attacks. Various softwares like intrusion detection systems, web servers, anti-virus and anti-malware systems, firewalls and network devices generate logs with useful information, that can be used to protect against such system attacks. Analyzing log files can help in pro- actively avoiding attacks against the systems. While there are existing tools that do a good job when the format of log files is known, the challenge lies in cases where log files are from unknown devices and of unknown formats. We propose a framework that takes any log file and automatically gives out a semantic interpretation as a set of RDF Linked Data triples. The framework splits a log file into columns using regular expression-based or dictionary-based classifiers. Leveraging and modifying our existing work on inferring the semantics of tables, we identify every column from a log file and map it to concepts either from a general purpose KB like DBpedia or domain specific ontologies such as IDS. We also identify relationships between various columns in such log files. Converting large and verbose log files into such semantic representations will help in better search, integration and rich reasoning over the data.
November 8th, 2015, by Tim Finin, posted in cybersecurity, Ontologies, Semantic Web
In this report, we describe the Unified Cyber Security ontology (UCO) to support situational awareness in cyber security systems. The ontology is an effort to incorporate and integrate heterogeneous information available from different cyber security systems and most commonly used cyber security standards for information sharing and exchange. The ontology has also been mapped to a number of existing cyber security ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for Linked Open Data cloud, we envision UCO to serve as the core for the specialized cyber security Linked Open Data cloud which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use-cases supported by the UCO ontology. To the best of our knowledge, this is the first cyber security ontology that has been mapped to general world ontologies to support broader and diverse security use-cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.
November 5th, 2015, by Tim Finin, posted in NLP, Ontologies, Semantic Web
Extracting Structured Summaries
from Text Documents
Dr. Zareen Syed
Research Assistant Professor, UMBC
10:30am, Monday, 9 November 2015, ITE 346, UMBC
In this talk, Dr. Syed will present unsupervised approaches for automatically extracting structured summaries composed of slots and fillers (attributes and values) and important facts from articles, thus effectively reducing the amount of time and effort spent on gathering intelligence by humans using traditional keyword based search approaches. The approach first extracts important concepts from text documents and links them to unique concepts in Wikitology knowledge base. It then exploits the types associated with the linked concepts to discover candidate slots and fillers. Finally it applies specialized approaches for ranking and filtering slots to select the most relevant slots to include in the structured summary.
Compared with the state of the art, Dr. Syed’s approach is unrestricted, i.e., it does not require manually crafted catalogue of slots or relations of interest that may vary over different domains. Unlike Natural Language Processing (NLP) based approaches that require well-formed sentences, the approach can be applied on semi-structured text. Furthermore, NLP based approaches for fact extraction extract lexical facts and sentences that require further processing for disambiguating and linking to unique entities and concepts in a knowledge base, whereas, in Dr. Syed’s approach, concept linking is done as a first step in the discovery process. Linking concepts to a knowledge base provides the additional advantage that the terms can be explicitly linked or mapped to semantic concepts in other ontologies and are thus available for reasoning in more sophisticated language understanding systems.
October 30th, 2015, by Tim Finin, posted in NLP, NLP, Semantic Web
In this week’s ebiquity lab meeting (10:30am Monday Nov 2), Tim Finin will describe recent work on the Kelvin information extraction system and its performance in two tasks in the 2015 NIST Text Analysis Conference. Kelvin has been under development at the JHU Human Language Center of Excellence for several years. Kelvin reads documents in several languages and extracts entities and relations between them. This year it was used for the Coldstart Knowledge Base Population and Trilingual Entity Discovery and Linking tasks. Key components in the tasks are a system for cross-document coreference and another that links entities to entries in the Freebase knowledge base.
October 29th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web
Lyrics Augmented Multi-modal
1:00pm Friday 30 October, ITE 325b
In an increasingly mobile and connected world, digital music consumption has rapidly increased. More recently, faster and cheaper mobile bandwidth has given the average mobile user the potential to access large troves of music through streaming services like Spotify and Google Music that boast catalogs with tens of millions of songs. At this scale, effective music recommendation is critical for music discovery and personalized user experience.
Recommenders that rely on collaborative information suffer from two major problems: the long tail problem, which is induced by popularity bias, and the cold start problem caused by new items with no data. In such cases, they fall back on content to compute similarity. For music, content based features can be divided into acoustic and textual domains. Acoustic features are extracted from the audio signal while textual features come from song metadata, lyrical content, collaborative tags and associated web text.
Research in content based music similarity has largely been focused in the acoustic domain while text based features have been limited to metadata, tags and shallow methods for web text and lyrics. Song lyrics house information about the sentiment and topic of a song that cannot be easily extracted from the audio. Past work has shown that even shallow lyrical features improved audio-only features and in some tasks like mood classification, outperformed audio-only features. In addition, lyrics are also easily available which make them a valuable resource and warrant a deeper analysis.
The goal of this research is to fill the lyrical gap in existing music recommender systems. The first step is to build algorithms to extract and represent the meaning and emotion contained in the song’s lyrics. The next step is to effectively combine lyrical features with acoustic and collaborative information to build a multi-modal recommendation engine.
For this work, the genre is restricted to Rap because it is a lyrics-centric genre and techniques built for Rap can be generalized to other genres. It was also the highest streamed genre in 2014, accounting for 28.5% of all music streamed. Rap lyrics are scraped from dedicated lyrics websites like ohhla.com and genius.com while the semantic knowledge base comprising artists, albums and song metadata come from the MusicBrainz project. Acoustic features are directly used from EchoNest while collaborative information like tags, plays, co-plays etc. come from Last.fm.
Preliminary work involved extraction of compositional style features like rhyme patterns and density, vocabulary size, simile and profanity usage from over 10,000 songs by over 150 artists. These features are available for users to browse and explore through interactive visualizations on Rapalytics.com. Song semantics were represented using off-the-shelf neural language based vector models (doc2vec). Future work will involve building novel language models for lyrics and latent representations for attributes that is driven by collaborative information for multi-modal recommendation.
Committee: Drs. Tim Finin (Chair), Anupam Joshi, Pranam Kolari (WalmartLabs), Cynthia Matuszek and Tim Oates
September 29th, 2015, by Tim Finin, posted in NLP, Ontologies, RDF, Semantic Web
Clare Grasso, Anupam Joshi and ELior Siegel, Beyond NER: Towards Semantics in Clinical Text, Biomedical Data Mining, Modeling, and Semantic Integration (BDM2I); co-located with the 14th International Semantic Web Conference (ISWC 2015), Bethlehem, PA.
While clinical text NLP systems have become very effective in recognizing named entities in clinical text and mapping them to standardized terminologies in the normalization process, there remains a gap in the ability of extractors to combine entities together into a complete semantic representation of medical concepts that contain multiple attributes each of which has its own set of allowed named entities or values. Furthermore, additional domain knowledge may be required to determine the semantics of particular tokens in the text that take on special meanings in relation to this concept. This research proposes an approach that provides ontological mappings of the surface forms of medical concepts that are of the UMLS semantic class signs/symptoms. The mappings are used to extract and encode the constituent set of named entities into interoperable semantic structures that can be linked to other structured and unstructured data for reuse in research and analysis.
You are currently browsing the archives for the Semantic Web category.