paper: Context Sensitive Access Control in Smart Home Environments

May 30th, 2020

Sofia Dutta, Sai Sree Laya Chukkapalli, Madhura Sulgekar, Swathi Krithivasan, Prajit Kumar Das, and Anupam Joshi, Context Sensitive Access Control in Smart Home Environments, 6th IEEE International Conference on Big Data Security on Cloud, May 2020

The rise in popularity of Internet of Things (IoT) devices has opened doors for privacy and security breaches in Cyber-Physical systems like smart homes, smart vehicles, and smart grids that affect our daily existence. IoT systems are also a source of big data that gets shared via the cloud. IoT systems in a smart home environment have sensitive access control issues since they are deployed in a personal space. The collected data can also be of a highly personal nature. Therefore, it is critical to building access control models that govern who, under what circumstances, can access which sensed data or actuate a physical system. Traditional access control mechanisms are not expressive enough to handle such complex access control needs, warranting the incorporation of new methodologies for privacy and security. In this paper, we propose the creation of the PALS system, that builds upon existing work in an attribute-based access control model, captures physical context collected from sensed data (attributes) and performs dynamic reasoning over these attributes and context-driven policies using Semantic Web technologies to execute access control decisions. Reasoning over user context, details of the information collected by the cloud service provider, and device type our mechanism generates as a consequent access control decisions. Our system’s access control decisions are supplemented by another sub-system that detects intrusions into smart home systems based on both network and behavioral data. The combined approach serves to determine indicators that a smart home system is under attack, as well as limit what data breach such attacks can achieve.

pals architecture

paper: Automating GDPR Compliance using Policy Integrated Blockchain

May 30th, 2020

Automating GDPR Compliance using Policy Integrated Blockchain

Abhishek Mahindrakar and Karuna Pande Joshi, Automating GDPR Compliance using Policy Integrated Blockchain, 6th IEEE International Conference on Big Data Security on Cloud, May 2020.

Data protection regulations, like GDPR, mandate security controls to secure personally identifiable information (PII) of the users which they share with service providers. With the volume of shared data reaching exascale proportions, it is challenging to ensure GDPR compliance in real-time. We propose a novel approach that integrates GDPR ontology with blockchain to facilitate real-time automated data compliance. Our framework ensures data operation is allowed only when validated by data privacy policies in compliance with privacy rules in GDPR. When a valid transaction takes place the PII data is automatically stored off-chain in a database. Our system, built using Semantic Web and Ethereum Blockchain, includes an access control system that enforces data privacy policy when data is shared with third parties.

paper: Temporal Understanding of Cybersecurity Threats

May 28th, 2020
Click to view this narrated presentation from the conference

Temporal Understanding of Cybersecurity Threats

Jennifer Sleeman, Tim Finin, and Milton Halem, Temporal Understanding of Cybersecurity Threats, IEEE International Conference on Big Data Security on Cloud, May 2020.

As cybersecurity-related threats continue to increase, understanding how the field is changing over time can give insight into combating new threats and understanding historical events. We show how to apply dynamic topic models to a set of cybersecurity documents to understand how the concepts found in them are changing over time. We correlate two different data sets, the first relates to specific exploits and the second relates to cybersecurity research. We use Wikipedia concepts to provide a basis for performing concept phrase extraction and show how using concepts to provide context improves the quality of the topic model. We represent the results of the dynamic topic model as a knowledge graph that could be used for inference or information discovery.

Paper: Reinforcement Quantum Annealing: A Hybrid Quantum Learning Automata

May 24th, 2020

Reinforcement Quantum Annealing:
A Hybrid Quantum Learning Automata

Ramin Ayanzadeh, Milton Halem, and Tim Finin, Reinforcement Quantum Annealing: A Hybrid Quantum Learning Automata, Nature Scientific Reports, v10, n1, May 2020

We introduce the notion of reinforcement quantum annealing (RQA) scheme in which an intelligent agent searches in the space of Hamiltonians and interacts with a quantum annealer that plays the stochastic environment role of learning automata. At each iteration of RQA, after analyzing results (samples) from the previous iteration, the agent adjusts the penalty of unsatisfied constraints and re-casts the given problem to a new Ising Hamiltonian. As a proof-of-concept, we propose a novel approach for casting the problem of Boolean satisfiability (SAT) to Ising Hamiltonians and show how to apply the RQA for increasing the probability of finding the global optimum. Our experimental results on two different benchmark SAT problems (namely factoring pseudo-prime numbers and random SAT with phase transitions), using a D-Wave 2000Q quantum processor, demonstrated that RQA finds notably better solutions with fewer samples, compared to the best-known techniques in the realm of quantum annealing.

See also:

Defense: Taneeya Satyapanich, Modeling and Extracting Information about Cybersecurity Events from Text

November 14th, 2019

Ph.D. Dissertation Defense

Modeling and Extracting Information about Cybersecurity Events from Text

Taneeya Satyapanich

9:30-11:30 Monday, 18 November, 2019, ITE346?

People now rely on the Internet to carry out much of their daily activities such as banking, ordering food, and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data, and identity theft. With the large and increasing number of transactions done every day, the frequency of cybercrime events is also growing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cyber threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation makes two significant contributions. First, we defined rich cybersecurity event schema and annotated the news corpus following the schema. Our schema consists of event type definitions, semantic roles, and event arguments. Second, we present CASIE, a cybersecurity event extraction system. CASIE can detect cybersecurity events, identify event participants and their roles, including specifying realis values. It also groups the events, which are coreference.  CASIE produces output in easy to use format as a JSON object.

We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly grasp cybersecurity event information out of the unstructured text and fill in the event frame. So we can compete with tons of cybersecurity events that happen every day.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates, Karuna Pande Joshi, Francis Ferraro

Why does Google think Raymond Chandler starred in Double Indemnity?

November 14th, 2019

In my knowledge graph class yesterday we talked about the SPARQL query language and I illustrated it with DBpedia queries, including an example getting data about the movie Double Indemnity. I had brought a google assistant device and used it to compare its answers to those from DBpedia. When I asked the Google assistant “Who starred in the film Double Indemnity”, the first person it mentioned was Raymond Chandler. I knew this was wrong, since he was one of its screenwriters, not an actor, and shared an Academy Award for the screenplay. DBpedia’s data was correct and did not list Chandler as one of the actors.

I did not feel too bad about this — we shouldn’t expect perfect accuracy in these huge, general purpose knowledge graphs and at least Chandler played an important role in making the film.

After class I looked at the Wikidata page for Double Indemnity (Q478209) and saw that it did list Chandler as an actor. I take this as evidence that Google’s knowledge Graph got this incorrect fact from Wikidata, or perhaps from a precursor, Freebase.

The good news 🙂 is that Wikidata had flagged the fact that Chandler (Q180377) was a cast member in Double Indemnity with a “potential Issue“. Clicking on this revealed that the issue was that Chandler was not known to have an occupation property that a “cast member” property (P161) expects, which includes twelve types, such as actor, opera singer, comedian, and ballet dancer. Wikidata lists chandler’s occupations as screenwriter, novelist, write and poet.

More good news 😀 is that the Wikidata fact had provenance information in the form of a reference stating that it came from CSFD (Q3561957), a “Czech and Slovak web project providing a movie database”. Following the link Wikidata provided led me eventually to the resource, which allowed my to search for and find its Double Indemnity entry. Indeed, it lists Raymond Chandler as one of the movie’s Hrají. All that was left to do was to ask for a translation, which confirmed that Hrají means “starring”.

Case closed? Well, not quite. What remains is fixing the problem.

The final good news 🙂 is that it’s easy to edit or delete an incorrect fact in Wikidata. I plan to delete the incorrect fact in class next Monday. I’ll look into possible options to add an annotation in some way to ignore the incorrect ?SFD source for Chander being a cast member over the weekend.

Some possible bad news 🙁 that public knowledge graphs like Wikidata might be exploited by unscrupulous groups or individuals in the future to promote false or biased information. Wikipedia is reasonably resilient to this, but the problem may be harder to manage for public knowledge graphs, which get much their data from other sources that could be manipulated.

TALK: Real-time knowledge extraction from short semi-structured documents

November 3rd, 2019

A semantically rich framework to enable real-time knowledge extraction from short length semi-structured documents

Lavana Elluri

10:30-11:30 Monday, 4 November 2019, ITE346

Knowledge is currently maintained as a large volume of unstructured text data in books, laws, regulations and policies, news and social media, academic and scientific reports, conversation and correspondence, etc. Most of these text documents are not often machine-processable. Hence it is hard to find relevant information from these texts quickly. Extracting and categorizing knowledge from the text of these numerous text stores requires significant manual effort and time. A critical open challenge that we propose to address is automated incremental text classification and identifying context from small documents. Our aim is to develop a semantically rich framework, including algorithms that will extract and classify the context of the text in real-time, to help enable users that update their policies regularly and organizations that are submitting proposals. We will use techniques from deep learning, semantic web, and natural language processing to build this framework. Our objectives include representing knowledge in cloud compliance / legal texts to create and populate a knowledge graph based on data protection regulations. Additionally, we will also correlate rules implemented in the referencing document with the rules in original policies to determine context similarity.

TALK: Automated Data Augmentation via Wikidata Relationships

October 20th, 2019

Automated Data Augmentation via Wikidata Relationships

Oyesh Singh, UMBC
10:30-11:30 Monday, 21 October 2019, ITE 346

With the increase in complexity of machine learning models, there is more need for data than ever. In order to fill this gap of annotated data-scarce situation, we look towards the ocean of free data present in Wikipedia and other WIkimedia resources. Wikipedia has an enormous amount of data in many languages along with the knowledge graph defined in Wikidata. In this presentation, I will explain how we utilized the Wikipedia/Wikidata data to boost the performance of BERT models for named entity recognition.

paper: Quantum Annealing Based Binary Compressive Sensing with Matrix Uncertainty

January 13th, 2019

Quantum Annealing Based Binary Compressive Sensing with Matrix Uncertainty

Ramin Ayanzadeh, Seyedahmad Mousavi, Milton Halem and Tim Finin, Quantum Annealing Based Binary Compressive Sensing with Matrix Uncertainty, arXiv:1901.00088 [cs.IT], 1 January 2019.

Compressive sensing is a novel approach that linearly samples sparse or compressible signals at a rate much below the Nyquist-Shannon sampling rate and outperforms traditional signal processing techniques in acquiring and reconstructing such signals. Compressive sensing with matrix uncertainty is an extension of the standard compressive sensing problem that appears in various applications including but not limited to cognitive radio sensing, calibration of the antenna, and deconvolution. The original problem of compressive sensing is NP-hard so the traditional techniques, such as convex and nonconvex relaxations and greedy algorithms, apply stringent constraints on the measurement matrix to indirectly handle this problem in the realm of classical computing.

We propose well-posed approaches for both binary compressive sensing and binary compressive sensing with matrix uncertainty problems that are tractable by quantum annealers. Our approach formulates an Ising model whose ground state represents a sparse solution for the binary compressive sensing problem and then employs an alternating minimization scheme to tackle the binary compressive sensing with matrix uncertainty problem. This setting only requires the solution uniqueness of the considered problem to have a successful recovery process, and therefore the required conditions on the measurement matrix are notably looser. As a proof of concept, we can demonstrate the applicability of the proposed approach on the D-Wave quantum annealers; however, we can adapt our method to employ other modern computing phenomena–like adiabatic quantum computers (in general), CMOS annealers, optical parametric oscillators, and neuromorphic computing.

paper: DAbR: Dynamic Attribute-based Reputation scoring for Malicious IP Address Detection

October 9th, 2018

DAbR: Dynamic Attribute-based Reputation Scoring for Malicious IP Address Detection

Arya Renjan, Karuna Pande Joshi, Sandeep Nair Narayanan and Anupam Joshi, DAbR: Dynamic Attribute-based Reputation Scoring for Malicious IP Address Detection, IEEE Intelligence and Security Informatics, November 2018.


To effectively identify and filter out attacks from known sources like botnets, spammers, virus infected systems etc., organizations increasingly procure services that determine the reputation of IP addresses. Adoption of encryption techniques like TLS 1.2 and 1.3 aggravate this cause, owing to the higher cost of decryption needed for examining traffic contents. Currently, most IP reputation services provide blacklists by analyzing malware and spam records. However, newer but similar IP addresses used by the same attackers need not be present in such lists and attacks from them will get bypassed. In this paper, we present Dynamic Attribute based Reputation (DAbR), a Euclidean distance-based technique, to generate reputation scores for IP addresses by assimilating meta-data from known bad IP addresses. This approach is based on our observation that many bad IP’s share similar attributes and the requirement for a lightweight technique for reputation scoring. DAbR generates reputation scores for IP addresses on a 0-10 scale which represents its trustworthiness based on known bad IP address attributes. The reputation scores when used in conjunction with a policy enforcement module, can provide high performance and non-privacy-invasive malicious traffic filtering. To evaluate DAbR, we calculated reputation scores on a dataset of 87k IP addresses and used them to classify IP addresses as good/bad based on a threshold. An F-1 score of 78% in this classification task demonstrates our technique’s performance.

paper: Early Detection of Cybersecurity Threats Using Collaborative Cognition

October 1st, 2018

The CCS Dashboard’s sections provide information on sources and targets of network events, file operations monitored and sub-events that are part of the APT kill chain. An alert is generated when a likely complete APT is detected after reasoning over events.

The CCS Dashboard’s sections provide information on sources and targets of network events, file operations monitored and sub-events that are part
of the APT kill chain. An alert is generated when a likely complete APT is detected after reasoning over events.

Early Detection of Cybersecurity Threats Using Collaborative Cognition

Sandeep Narayanan, Ashwinkumar Ganesan, Karuna Joshi, Tim Oates, Anupam Joshi and Tim Finin, Early detection of Cybersecurity Threats using Collaborative Cognition, 4th IEEE International Conference on Collaboration and Internet Computing, Philadelphia, October. 2018.


The early detection of cybersecurity events such as attacks is challenging given the constantly evolving threat landscape. Even with advanced monitoring, sophisticated attackers can spend more than 100 days in a system before being detected. This paper describes a novel, collaborative framework that assists a security analyst by exploiting the power of semantically rich knowledge representation and reasoning integrated with different machine learning techniques. Our Cognitive Cybersecurity System ingests information from various textual sources and stores them in a common knowledge graph using terms from an extended version of the Unified Cybersecurity Ontology. The system then reasons over the knowledge graph that combines a variety of collaborative agents representing host and network-based sensors to derive improved actionable intelligence for security administrators, decreasing their cognitive load and increasing their confidence in the result. We describe a proof of concept framework for our approach and demonstrate its capabilities by testing it against a custom-built ransomware similar to WannaCry.

AAAI Symposium on Privacy-Enhancing AI and HLT Technologies

July 31st, 2018

PAL: Privacy-Enhancing AI and Language Technologies

AAAI Spring Symposium
25-27 March 2019, Stanford University

This symposium will bring together researchers in privacy and researchers in either artificial intelligence (AI) or human language technologies (HLTs), so that we may collectively assess the state of the art in this growing intersection of interests. Privacy remains an evolving and nuanced concern of computer users, as new technologies that use the web, smartphones, and the internet of things (IoT) collect a myriad of personal information. Rather than viewing AI and HLT as problems for privacy, the goal of this symposium is to “flip the script” and explore how AI and HLT can help meet users’ desires for privacy when interacting with computers.

It will focus on two loosely-defined research questions:

  • How can AI and HLT preserve or protect privacy in challenging situations?
  • How can AI and HLT help interested parties (e.g., computer users, companies, regulatory agencies) understand privacy in the status quo and what people want?

The symposium will consist of invited speakers, oral presentations of submitted papers, a poster session, and panel discussions. This event is a successor to Privacy and Language Technologies (“PLT”), a 2016 AAAI Fall Symposium. Submissions are due 2 November 2018.  For more information, see the symposium site.