Preventing Poisoning Attacks on Threat Intelligence Systems

April 22nd, 2018

Preventing Poisoning Attacks on Threat Intelligence Systems

Nitika Khurana, Graduate Student, UMBC

11:00-12:00 Monday, 23 April 2018, ITE346, UMBC

As AI systems become more ubiquitous, securing them becomes an emerging challenge. Over the years, with the surge in online social media use and the data available for analysis, AI systems have been built to extract, represent and use this information. The credibility of this information extracted from open sources, however, can often be questionable. Malicious or incorrect information can cause a loss of money, reputation, and resources; and in certain situations, pose a threat to human life. In this paper, we determine the credibility of Reddit posts by estimating their reputation score to ensure the validity of information ingested by AI systems. We also maintain the provenance of the output generated to ensure information and source reliability and identify the background data that caused an attack. We demonstrate our approach in the cybersecurity domain, where security analysts utilize these systems to determine possible threats by analyzing the data scattered on social media websites, forums, blogs, etc.


UMBC at SemEval-2018 Task 8: Understanding Text about Malware

April 21st, 2018

UMBC at SemEval-2018 Task 8: Understanding Text about Malware

Ankur Padia, Arpita Roy, Taneeya Satyapanich, Francis Ferraro, Shimei Pan, Anupam Joshi and Tim Finin, UMBC at SemEval-2018 Task 8: Understanding Text about Malware, Int. Workshop on Semantic Evaluation (collocated with NAACL-HLT), New Orleans, LA, June 2018.

We describe the systems developed by the UMBC team for 2018 SemEval Task 8, SecureNLP (Semantic Extraction from CybersecUrity REports using Natural Language Processing). We participated in three of the sub-tasks: (1) classifying sentences as being relevant or irrelevant to malware, (2) predicting token labels for sentences, and (4) predicting attribute labels from the Malware Attribute Enumeration and Characterization vocabulary for defining malware characteristics. We achieved F1 scores of 50.34/18.0 (dev/test), 22.23 (test-data), and 31.98 (test-data) for Task1, Task2 and Task2 respectively. We also make our cybersecurity embeddings publicly available at https://bit.ly/cybr2vec.


Cognitively Rich Framework to Automate Extraction & Representation of Legal Knowledge

April 15th, 2018

Cognitively Rich Framework to Automate Extraction and Representation of Legal Knowledge

Srishty Saha, UMBC
11-12 Monday, 16 April 2018, ITE 346

With the explosive growth in cloud-based services, businesses are increasingly maintaining large datasets containing information about their consumers to provide a seamless user experience. To ensure privacy and security of these datasets, regulatory bodies have specified rules and compliance policies that must be adhered to by organizations. These regulatory policies are currently available as text documents that are not machine processable and so require extensive manual effort to monitor them continuously to ensure data compliance. We have developed a cognitive framework to automatically parse and extract knowledge from legal documents and represent it using an Ontology. The legal ontology captures key-entities and their relations, the provenance of legal-policy and cross-referenced semantically similar legal facts and rules. We have applied this framework to the United States government’s Code of Federal Regulations (CFR) which includes facts and rules for individuals and organizations seeking to do business with the US Federal government.


UMBC/ICMA Survey of Local Government Cybersecurity Practices

April 14th, 2018

 

UMBC/ICMA Survey of Local Government Cybersecurity Practices

In 2016, the International City/County Management Association (ICMA), in partnership with the University of Maryland, Baltimore County (UMBC), conducted a survey to better understand local government cybersecurity practices. The results of this survey provide insights into the cybersecurity issues faced by U.S. local governments, including what their capacities are, what kind of barriers they face, and what type of support they have to implement cybersecurity programs.

The survey was sent on paper via postal mail to the chief information officers of 3,423 U.S. local governments with populations of 25,000 or greater. An online submission option was also made available to survey recipients. Responses were received from 411 of the governments surveyed, yielding a response rate of 12%.

A summary of the results written by ICMA staff is available here.


2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning

April 11th, 2018

2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning

The 2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL) is a student-run, one-day event on speech, language & machine learning research to be held at the University of Maryland, Baltimore County  (UMBC) from 10:00am to 6:00pm on Saturday May 12.  There is no registration charge and lunch and refreshments will be provided.  Students, postdocs, faculty and researchers from universities & industry are invited to participate and network with other researchers working in related fields.

Students and postdocs are encouraged to submit abstracts describing ongoing, planned, or completed research projects, including previously published results and negative results. Research in any field applying computational methods to any aspect of human language, including speech and learning, from all areas of computer science, linguistics, engineering, neuroscience, information science, and related fields is welcome. Submissions and presentations must be made by students or postdocs. Accepted submissions will be presented as either posters or talks.

Important Dates are:

  • Submission deadline (abstracts): April 16
  • Decisions announced: April 21
  • Registration opens: April 10
  • Registration closes: May 6
  • Colloquium: May 12

Link Before You Share: Managing Privacy Policies through Blockchain

March 30th, 2018

Link Before You Share: Managing Privacy Policies through Blockchain

Agniva Banerjee,  UMBC
11:00-12:00 Monday, 2 April 2018

Cloud-based content providers, utilities, and applications, each employ of privacy policies and its associated overhead, it is becoming increasingly difficult for concerned users to manage and track the confidential information that they share with the providers. Users consent to providers to gather and share their Personally Identifiable Information (PII). We have developed a novel framework to ingest a text-based privacy policy document, intelligently parse and extract relevant terms and populate a privacy policy ontology, and thereafter automatically track details about how a user’s PII data is stored, used and shared by the provider. We have integrated this Data Privacy ontology with the properties of blockchain, to develop an automated access-control and audit mechanism that enforces users’ data privacy policies when sharing their data across third parties.

Agniva Banerjee, and Karuna Pande Joshi, Link Before You Share: Managing Privacy Policies through Blockchain, 4th International Workshop on Privacy and Security of Big Data (PSBD 2017), in conjunction with 2017 IEEE International Conference on Big Data, 4 December 2017.

 


AI for Cybersecurity: Intrusion Detection Using Neural Networks

March 25th, 2018

AI for Cybersecurity: Intrusion Detection Using Neural Networks

Sowmya Ramapatruni, UMBC

11:00-12:00 Monday 26 March, 2018, ITE346, UMBC

The constant growth in the use of computer networks raised concerns about security and privacy. Intrusion attacks on computer networks is a very common attack on internet today. Intrusion detection systems have been considered essential in keeping network security and therefore have been commonly adopted by network administrators. A possible disadvantage is the fact that such systems are usually based on signature systems, which make them strongly dependent on updated database and consequently inefficient against novel attacks (unknown attacks). In this study we analyze the use of machine learning in the development of intrusion detection system.

The focus of this presentation is to analyze the various machine learning algorithms that can be used to perform classification of network attacks. We will also analyze the common techniques used to build and fine tune artificial neural networks for network attack classification and address the drawbacks in these systems. We will also analyze the data sets and the information that is critical for the classification. The understanding of network packet data is essential for the feature engineering, which is an essential precursor activity for any machine learning systems. Finally, we study the drawbacks of existing machine learning systems and walk through the further study possible in this area.


paper: Cleaning Noisy Knowledge Graphs

January 27th, 2018

Cleaning Noisy Knowledge Graphs

Ankur Padia, Cleaning Noisy Knowledge Graphs, Proceedings of the Doctoral Consortium at the 16th International Semantic Web Conference, October 2017.

My dissertation research is developing an approach to identify and explain errors in a knowledge graph constructed by extracting entities and relations from text. Information extraction systems can automatically construct knowledge graphs from a large collection of documents, which might be drawn from news articles, Web pages, social media posts or discussion forums. The language understanding task is challenging and current extraction systems introduce many kinds of errors. Previous work on improving the quality of knowledge graphs uses additional evidence from background knowledge bases or Web searches. Such approaches are diffuclt to apply when emerging entities are present and/or only one knowledge graph is available. In order to address the problem I am using multiple complementary techniques including entitylinking, common sense reasoning, and linguistic analysis.

 


Videos of ISWC 2017 talks

December 16th, 2017

Videos of almost all of the talks from the 16th International Semantic Web Conference (ISWC) held in Vienna in 2017 are online at videolectures.net. They include 89 research presentations, two keynote talks, the one-minute madness event and the opening and closing ceremonies.


Jennifer Sleeman receives AI for Earth grant from Microsoft

December 12th, 2017

Jennifer Sleeman receives AI for Earth grant from Microsoft

Visiting Assistant Professor Jennifer Sleeman (Ph.D. ’17)  has been awarded a grant from Microsoft as part of its ‘AI for Earth’ program. Dr. Sleeman will use the grant to continue her research on developing algorithms to model how scientific disciplines such as climate change evolve and predict future trends by analyzing the text of articles and reports and the papers they cite.

AI for Earth is a Microsoft program aimed at empowering people and organizations to solve global environmental challenges by increasing access to AI tools and educational opportunities, while accelerating innovation. Via the Azure for Research AI for Earth award program, Microsoft provides selected researchers and organizations access to its cloud and AI computing resources to accelerate, improve and expand work on climate change, agriculture, biodiversity and/or water challenges.

UMBC is among the first grant recipients of AI for Earth, first launched in July 2017. The grant process was a competitive and selective process and was awarded in recognition of the potential of the work and power of AI to accelerate progress.

As part of her dissertation research, Dr. Sleeman developed algorithms using dynamic topic modeling to understand influence and predict future trends in a scientific discipline. She applied this to the field of climate change and used assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. Her custom dynamic topic modeling algorithm identified topics for both datasets and apply cross-domain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time.

Dr. Sleeman’s award is part of an inaugural set of 35 grants in more than ten countries for access to Microsoft Azure and AI technology platforms, services and training.  In an post on Monday, AI for Earth can be a game-changer for our planet, Microsoft announced its intent to put $50 million over five years into the program, enabling grant-making and educational trainings possible at a much larger scale.

More information about AI for Earth can be found on the Microsoft AI for Earth website.


Link Before You Share: Managing Privacy Policies through Blockchain

December 4th, 2017

Link Before You Share: Managing Privacy Policies through Blockchain

Agniva Banerjee, and Karuna Pande Joshi, Link Before You Share: Managing Privacy Policies through Blockchain, 4th International Workshop on Privacy and Security of Big Data (PSBD 2017), in conjunction with 2017 IEEE International Conference on Big Data, 4 December 2017.

With the advent of numerous online content providers, utilities and applications, each with their own specific version of privacy policies and its associated overhead, it is becoming increasingly difficult for concerned users to manage and track the confidential information that they share with the providers. We have developed a novel framework to automatically track details about how a user’s PII is stored, used and shared by the provider. We have integrated our data privacy ontology with the properties of blockchain, to develop an automated access-control and audit mechanism that enforces users’ data privacy policies when sharing their data across third parties. We have also validated this framework by implementing a working system LinkShare. In this paper, we describe our framework on detail along with the LinkShare system. Our approach can be adopted by big data users to automatically apply their privacy policy on data operations and track the flow of that data across various stakeholders.


paper: Automated Knowledge Extraction from the Federal Acquisition Regulations System

November 28th, 2017

Automated Knowledge Extraction from the Federal Acquisition Regulations System (FARS)

Srishty Saha and Karuna Pande Joshi, Automated Knowledge Extraction from the Federal Acquisition Regulations System (FARS), 2nd International Workshop on Enterprise Big Data Semantic and Analytics Modeling, IEEE Big Data Conference, December 2017.

With increasing regulation of Big Data, it is becoming essential for organizations to ensure compliance with various data protection standards. The Federal Acquisition Regulations System (FARS) within the Code of Federal Regulations (CFR) includes facts and rules for individuals and organizations seeking to do business with the US Federal government. Parsing and gathering knowledge from such lengthy regulation documents is currently done manually and is time and human intensive.Hence, developing a cognitive assistant for automated analysis of such legal documents has become a necessity. We have developed semantically rich approach to automate the analysis of legal documents and have implemented a system to capture various facts and rules contributing towards building an ef?cient legal knowledge base that contains details of the relationships between various legal elements, semantically similar terminologies, deontic expressions and cross-referenced legal facts and rules. In this paper, we describe our framework along with the results of automating knowledge extraction from the FARS document (Title48, CFR). Our approach can be used by Big Data Users to automate knowledge extraction from Large Legal documents.