Open Information Extraction for Code-Mixed Hindi-English Social Media Data
1:00pm Monday, 2 July 2018, ITE 325b, UMBC
Open domain relation extraction (Angeli, Premkumar, & Manning 2015) is a process of finding relation triples. While there are a number of available systems for open information extraction (Open IE) for a single language, traditional Open IE systems are not well suited to content that contains multiple languages in a single utterance. In this thesis, we have extended a existing code mix corpus (Das, Jamatia, & Gambck 2015) by finding and annotating relation triples in Open IE fashion. Using this newly annotated corpus, we have experimented with seq2seq neural network (Zhang, Duh, & Durme 2017) for finding the relationship triples. As prerequisite for relationship extraction pipeline, we have developed part-of-speech tagger and named entity and predicate recognizer for code-mix content. We have experimented with various approaches such as Conditional Random Fields (CRF), Average Perceptron and deep neural networks. According to our knowledge, this relationship extraction system is first ever contribution for any codemix natural language. We have achieved promising results for all of the components and it could be improved in future with more codemix data.
Committee: Drs. Frank Ferraro (Chair), Tim Finin, Hamed Pirsiavash, Bryan Wilkinson
Medical organizations find it challenging to adopt cloud-based electronic medical records services, due to the risk of data breaches and the resulting compromise of patient data. Existing authorization models follow a patient centric approach for EHR management where the responsibility of authorizing data access is handled at the patients’ end. This however creates a significant overhead for the patient who has to authorize every access of their health record. This is not practical given the multiple personnel involved in providing care and that at times the patient may not be in a state to provide this authorization. Hence there is a need of developing a proper authorization delegation mechanism for safe, secure and easy cloud-based EHR management. We have developed a novel, centralized, attribute based authorization mechanism that uses Attribute Based Encryption (ABE) and allows for delegated secure access of patient records. This mechanism transfers the service management overhead from the patient to the medical organization and allows easy delegation of cloud-based EHR’s access authority to the medical providers. In this paper, we describe this novel ABE approach as well as the prototype system that we have created to illustrate it.
Understanding the Logical and Semantic Structure of Large Documents
Muhammad Mahbubur Rahman
11:00am Wednesday, 30 May 2018, ITE 325b
Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports poses challenges not present in short documents. The reasons behind this challenge are that large documents may be multi-themed, complex, noisy and cover diverse topics. This dissertation describes a framework that can analyze large documents, and help people and computer systems locate desired information in them. It aims to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of this research is modeling and extracting the logical and semantic structure of electronic documents using deep learning techniques. The effectiveness and robustness of ?the framework is evaluated through extensive experiments on arXiv and requests for proposals datasets.
Committee Members: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Matuszek, James Mayfield (JHU)
Russians hack home internet connections, here is how to protect yourself
Sandeep Nair Narayanan, Anupam Joshi and Sudip Mittal
In late April, the top federal cybersecurity agency, US-CERT, announced that Russian hackers had attacked internet-connected devices throughout the U.S., including network routers in private homes. Most people set them up – or had their internet service provider set them up – and haven’t thought much about them since. But it’s the gateway to the internet for every device on your home network, including Wi-Fi connected ones. That makes it a potential target for anyone who wants to attack you, or, more likely, use your internet connection to attack someone else.
As graduatestudents and faculty doing research in cybersecurity, we know that hackers can take control of many routers, because manufacturers haven’t set them up securely. Router administrative passwords often are preset at the factory to default values that are widely known, like “admin” or “password.” By scanning the internet for older routers and guessing their passwords with specialized software, hackers can take control of routers and other devices. Then they can install malicious programs or modify the existing software running the device.
Once an attacker takes control
There’s a wide range of damage that a hacker can do once your router has been hijacked. Even though most people browse the web using securely encrypted communications, the directions themselves that let one computer connect to another are often not secure. When you want to connect to, say, theconversation.com, your computer sends a request to a domain name server – a sort of internet traffic director – for instructions on how to connect to that website. That request goes to the router, which either responds directly or passes it to another domain name server outside your home. That request, and the response, are not usually encrypted.
A hacker could take advantage of that and intercept your computer’s request, to track the sites you visit. An attacker could also attempt to alter the reply, redirecting your computer to a fake website designed to steal your login information or even gain access to your financial data, online photos, videos, chats and browsing history.
In addition, a hacker can use your router and other internet devices in your home to send out large amounts of nuisance internet traffic as part of what are called distributed denial of service attacks, like the October 2016 attack that affected major internet sites like Quora, Twitter, Netflix and Visa.
Has your router been hacked?
An expert with complex technical tools may be able to discover whether your router has been hacked, but it’s not something a regular person is likely to be able to figure out. Fortunately, you don’t need to know that to kick out unauthorized users and make your network safe.
The first step is to try to connect to your home router. If you bought the router, check the manual for the web address to enter into your browser and the default login and password information. If your internet provider supplied the router, contact their support department to find out what to do.
If you’re not able to login, then consider resetting your router – though be sure to check with your internet provider to find out any settings you’ll need to configure to reconnect after you reset it. When your reset router restarts, connect to it and set a strong administrative password. The next step US-CERT suggests is to disable older types of internet communications, protocols like telnet, SNMP, TFTP and SMI that are often unencrypted or have other security flaws. Your router’s manual or online instructions should detail how to do that.
After securing your router, it’s important to keep it protected. Hackers are very persistent and are always looking to find more flaws in routers and other systems. Hardware manufacturers know this and regularly issue updates to plug security holes. So you should check regularly and install any updates that come out. Some manufacturers have smartphone apps that can manage their routers, which can make updating easier, or even automate the process.
Local governments’ cybersecurity crisis in eight charts
Donald Norris, Anupam Joshi, Laura Mateczun and Tim Finin
Within the past few weeks, two large American cities learned that their information systems were hacked. First, Atlanta revealed that it had been the victim of a ransomware attack that took many of the city’s services offline for nearly a week, forcing police to revert to taking written case notes, hampering the Atlanta’s court system and preventing residents from paying water bills online. Then, Baltimore’s 311 and 911 dispatch systems were taken offline for more than 17 hours, forcing dispatchers to log and process requests manually. Both attacks could have been prevented. And they are more evidence of the poor, if not appalling, state of local government cybersecurity in the United States.
Preventing Poisoning Attacks on Threat Intelligence Systems
Nitika Khurana, Graduate Student, UMBC
11:00-12:00 Monday, 23 April 2018, ITE346, UMBC
As AI systems become more ubiquitous, securing them becomes an emerging challenge. Over the years, with the surge in online social media use and the data available for analysis, AI systems have been built to extract, represent and use this information. The credibility of this information extracted from open sources, however, can often be questionable. Malicious or incorrect information can cause a loss of money, reputation, and resources; and in certain situations, pose a threat to human life. In this paper, we determine the credibility of Reddit posts by estimating their reputation score to ensure the validity of information ingested by AI systems. We also maintain the provenance of the output generated to ensure information and source reliability and identify the background data that caused an attack. We demonstrate our approach in the cybersecurity domain, where security analysts utilize these systems to determine possible threats by analyzing the data scattered on social media websites, forums, blogs, etc.
We describe the systems developed by the UMBC team for 2018 SemEval Task 8, SecureNLP (Semantic Extraction from CybersecUrity REports using Natural Language Processing). We participated in three of the sub-tasks: (1) classifying sentences as being relevant or irrelevant to malware, (2) predicting token labels for sentences, and (4) predicting attribute labels from the Malware Attribute Enumeration and Characterization vocabulary for defining malware characteristics. We achieved F1 scores of 50.34/18.0 (dev/test), 22.23 (test-data), and 31.98 (test-data) for Task1, Task2 and Task2 respectively. We also make our cybersecurity embeddings publicly available at https://bit.ly/cybr2vec.
Cognitively Rich Framework to Automate Extraction and Representation of Legal Knowledge
Srishty Saha, UMBC
11-12 Monday, 16 April 2018, ITE 346
With the explosive growth in cloud-based services, businesses are increasingly maintaining large datasets containing information about their consumers to provide a seamless user experience. To ensure privacy and security of these datasets, regulatory bodies have specified rules and compliance policies that must be adhered to by organizations. These regulatory policies are currently available as text documents that are not machine processable and so require extensive manual effort to monitor them continuously to ensure data compliance. We have developed a cognitive framework to automatically parse and extract knowledge from legal documents and represent it using an Ontology. The legal ontology captures key-entities and their relations, the provenance of legal-policy and cross-referenced semantically similar legal facts and rules. We have applied this framework to the United States government’s Code of Federal Regulations (CFR) which includes facts and rules for individuals and organizations seeking to do business with the US Federal government.
UMBC/ICMA Survey of Local Government Cybersecurity Practices
In 2016, the International City/County Management Association (ICMA), in partnership with the University of Maryland, Baltimore County (UMBC), conducted a survey to better understand local government cybersecurity practices. The results of this survey provide insights into the cybersecurity issues faced by U.S. local governments, including what their capacities are, what kind of barriers they face, and what type of support they have to implement cybersecurity programs.
The survey was sent on paper via postal mail to the chief information officers of 3,423 U.S. local governments with populations of 25,000 or greater. An online submission option was also made available to survey recipients. Responses were received from 411 of the governments surveyed, yielding a response rate of 12%.
A summary of the results written by ICMA staff is available here.
2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning
The 2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL) is a student-run, one-day event on speech, language & machine learning research to be held at the University of Maryland, Baltimore County (UMBC) from 10:00am to 6:00pm on Saturday May 12. There is no registration charge and lunch and refreshments will be provided. Students, postdocs, faculty and researchers from universities & industry are invited to participate and network with other researchers working in related fields.
Students and postdocs are encouraged to submit abstracts describing ongoing, planned, or completed research projects, including previously published results and negative results. Research in any field applying computational methods to any aspect of human language, including speech and learning, from all areas of computer science, linguistics, engineering, neuroscience, information science, and related fields is welcome. Submissions and presentations must be made by students or postdocs. Accepted submissions will be presented as either posters or talks.
Link Before You Share: Managing Privacy Policies through Blockchain
Agniva Banerjee, UMBC 11:00-12:00 Monday, 2 April 2018
AI for Cybersecurity: Intrusion Detection Using Neural Networks
Sowmya Ramapatruni, UMBC
11:00-12:00 Monday 26 March, 2018, ITE346, UMBC
The constant growth in the use of computer networks raised concerns about security and privacy. Intrusion attacks on computer networks is a very common attack on internet today. Intrusion detection systems have been considered essential in keeping network security and therefore have been commonly adopted by network administrators. A possible disadvantage is the fact that such systems are usually based on signature systems, which make them strongly dependent on updated database and consequently inefficient against novel attacks (unknown attacks). In this study we analyze the use of machine learning in the development of intrusion detection system.
The focus of this presentation is to analyze the various machine learning algorithms that can be used to perform classification of network attacks. We will also analyze the common techniques used to build and fine tune artificial neural networks for network attack classification and address the drawbacks in these systems. We will also analyze the data sets and the information that is critical for the classification. The understanding of network packet data is essential for the feature engineering, which is an essential precursor activity for any machine learning systems. Finally, we study the drawbacks of existing machine learning systems and walk through the further study possible in this area.