UMBC eBiquity Blog
Tim Finin, 2:43pm 17 May 2015
Transforming big data into smart data:
deriving value via harnessing volume, variety
and velocity using semantics and semantic web
Professor Amit Sheth
Wright State University
11:00am Tuesday, 26 May 2015, ITE 325, UMBC
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, "How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?" As I will show, Smart Data that gives such personalized and actionable information will need to utilize multimodal data and their metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on Machine Learning and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models. I will present a couple of Smart Data applications in development at Kno.e.sis from the domains of personalized health, health informatics, social data for social good, energy, disaster response, and smart city.
Amit Sheth is an Educator, Researcher and Entrepreneur. He is the LexisNexis Ohio Eminent Scholar, an IEEE Fellow, and the executive director of Kno.e.sis – the Ohio Center of Excellence in Knowledge-enabled Computing a Wright State University. In World Wide Web (WWW), it is placed among the top ten universities in the world based on 10-year impact. Prof. Sheth is a well cited computer scientists (h-index = 87, >30,000 citations), and appears among top 1-3 authors in World Wide Web (Microsoft Academic Search). He has founded two companies, and several commercial products and deployed systems have resulted from his research. His students are exceptionally successful; ten out of 18 past PhD students have 1,000+ citations each.
Host: Yelena Yesha, yeyesha2umbc.edu
Tim Finin, 8:08pm 11 May 2015
Information Extraction from Dirty Notes
for Clinical Decision Support
10:00am Tuesday, 12 May 2015, ITE346
The term clinical decision support refers broadly to providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care. It is estimated that at least 50% of the clinical information describing a patient’s current condition and stage of therapy resides in the free-form text portions of the Electronic Health Record (EHR). Both linguistic and statistical natural language processing (NLP) models assume the presence of a formal underlying grammar in the text. Yet, clinical notes are often times filled with overloaded and nonstandard abbreviations, sentence fragments, and creative punctuation that make it difficult for grammar-based NLP systems to work effectively. This research focuses on investigating scalable machine learning and semantic techniques that do not rely on an underlying grammar to extract medical concepts in the text in order to apply them in CDS on commodity hardware and software systems. Additionally, by packaging the extracted data within a semantic knowledge representation, the facts can be combined with other semantically encoded facts and reasoned over to help to inform clinicians in their decision making.
Tim Finin, 11:21pm 27 April 2015
In this weeks ebiquity lab meeting, Ankur Padia will talk about ontology learning and the work he did for his MS thesis at 10:00am in ITE 346 at UMBC.
10:00am Tuesday, Apr. 28, 2015, ITE 346
Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been made in axiomatic Ontology Learning (called Formal Ontology Learning) from Natural Language text documents. Presentation will focus on the relationship between Description Logic and Natural Language (limited to IS-A) for Formal Ontology Learning.
Tim Finin, 12:57pm 25 April 2015
Ph.D. Dissertation Defense
A Semantic Resolution Framework for Integrating
Manufacturing Service Capability Data
10:00am Monday 27 April 2015, ITE 217b
Building flexible manufacturing supply chains requires availability of interoperable and accurate manufacturing service capability (MSC) information of all supply chain participants. Today, MSC information, which is typically published either on the supplier’s web site or registered at an e-marketplace portal, has been shown to fall short of interoperability and accuracy requirements. The issue of interoperability can be addressed by annotating the MSC information using shared ontologies. However, this ontology-based approach faces three main challenges: (1) lack of an effective way to automatically extract a large volume of MSC instance data hidden in the web sites of manufacturers that need to be annotated; (2) difficulties in accurately identifying semantics of these extracted data and resolving semantic heterogeneities among individual sources of these data while integrating them under shared formal ontologies; (3) difficulties in the adoption of ontology-based approaches by the supply chain managers and users because of their unfamiliarity with the syntax and semantics of formal ontology languages such as the web ontology language (OWL).
The objective of our research is to address the main challenges of ontology-based approaches by developing an innovative approach that is able to extract MSC instances from a broad range of manufacturing web sites that may present MSC instances in various ways, accurately annotate MSC instances with formal defined semantics on a large scale, and integrate these annotated MSC instances into formal manufacturing domain ontologies to facilitate the formation of supply chains of manufacturers. To achieve this objective, we propose a semantic resolution framework (SRF) that consists of three main components: a MSC instance extractor, a MSC Instance annotator and a semantic resolution knowledge base. The instance extractor builds a local semantic model that we call instance description model (IDM) for each target manufacturer web site. The innovative aspect of the IDM is that it captures the intended structure of the target web site and associates each extracted MSC instance with a context that describes possible semantics of that instance. The instance annotator starts the semantic resolution by identifying the most appropriate class from a (or a set of) manufacturing domain ontology (or ontologies) (MDO) to annotate each instance based on the mappings established between the context of that instance and the vocabularies (i.e., classes and properties) defined in the MDO. The primary goal of the semantic resolution knowledge base (SR-KB) is to resolve semantic heterogeneity that may occur in the instance annotation process and thus improve the accuracy of the annotated MSC instances. The experimental results demonstrate that the instance extractor and the instance annotator can effectively discover and annotate MSC instances while the SR-KB is able to improve both precision and recall of annotated instances and reducing human involvement along with the evolution of the knowledge base.
Committee: Drs. Yun Peng (Chair), Tim Finin, Yaacov Yesha, Matthew Schmill and Boonserm Kulvatunyou
Tim Finin, 10:03pm 19 April 2015
In this week’s meeting (10-11am Tue, April 21), Ankur Padia will present work in progress on providing access control to an RDF triple store.
Triple store access control for a linked data fragments interface
Ankur Padia, UMBC
The maturation of Semantic Web standards and associated web-based data representations such as schema.org have made RDF a popular model for representing graph data and semi-structured knowledge. Triple stores are used to store and query an RDF dataset and often expose a SPARQL endpoint service on the Web for public access. Most existing SPARQL endpoints support very simple access control mechanisms if any at all, preventing their use for many applications where fine-grained privacy or data security is important. We describe new work on access control for a linked data fragments interface, i.e. one that accepts queries consisting one or more triple patterns and responds with all matching triples that the authenticated querier can access.
Tim Finin, 6:32pm 6 April 2015
In this week’s meeting, Sandeep Nair will talk about his work on ‘Preventing SQLIA and OJVMWCU, a web service utility for Oracle RDBMS‘ at 10:00am Tuesday, 7 April 2015 in ITE 346.
SQL Injection attacks have a long history dating back to 1999, but OWASP still maintains Injection attacks, which includes SQLIA, as the top rated vulnerability, due to the simplicity to perform and the high impact it can cause. SIAP is a project aimed at an automated attempt to secure ASP .NET with C# based web applications. The second tool OjvmWCU is a tool which is released with Oracle RDBMS 12.1, which allows users to call SOAP based web services using PLSQL!
Tim Finin, 12:08am 25 February 2015
Ph.D. Dissertation Proposal
User Identification in Wireless Networks
9:00-11:00pm Friday, 27 February 2015, ITE 325B
Wireless communication using the 802.11 specifications is almost ubiquitous in daily life through an increasing variety of platforms. Traditional identification and authentication mechanisms employed for wireless communication commonly mimic physically connected devices and do not account for the broadcast nature of the medium. Both stationary and mobile devices that users interact with are regularly authenticated using a passphrase, pre-shared key, or an authentication server. Current research requires unfettered access to the user’s platform or information that is not normally volunteered.
We propose a mechanism to verify and validate the identity of 802.11 device users by applying machine learning algorithms. Existing work substantiates the application of machine learning for device identification using Commercial Off-The-Shelf (COTS) hardware and algorithms. This research seeks the refinement of and investigation of features relevant to identifying users. The approach is segmented into three main areas: a data ingest platform, processing, and classification.
Initial research proved that we can properly classify target devices with high precision, recall, and ROC using a sufficiently large real-world data set and a limited set of features. The primary contribution of this work is exploring the development of user identification through data observation. A combination of identifying new features, creating an online system, and limiting user interaction is the objective. We will create a prototype system and test the effectiveness and accuracy of it’s ability to properly identify users.
Committee: Drs. Joshi (Chair/Advisor), Nicholas, Younis, Finin, Pearce, Banerjee
Tim Finin, 9:29pm 15 February 2015
ACM Tech Talk
Studying Internet Latency via TCP Queries to DNS
Dr. Yannis Labrou
Principal Data Architect, Verisign
1:30-2:30pm Friday, 27 February 2015, ITE 456, UMBC
Every day Verisign processes upwards of 100 billion authoritative DNS requests for .COM and .NET from all corners of the earth. The vast majority of these requests are via the UDP protocol. Because UDP is connectionless, it is impossible to passively estimate the latency of the UDP-based requests. A very small percentage of these requests though, are over TCP, thus providing the means to estimate the latency of specific requests and paths for a subset of the hosts that interact with Verisign’s network infrastructure.
In this work, we combine this relatively small number of datapoints from TCP (on the order of a few hundred million per day) with the much larger dataset of all DNS requests. Our focus is the process of data analysis of real world, imperfect data at very large scale with the goals of understanding network latency at an unprecedented magnitude, identifying large volume, high latency clients and improving their latency. We discuss the techniques we used for data selection and analysis and we present the results of a variety of analyses, such as deriving regional and country patterns, estimations for query latency for different countries and network locations, and techniques for identifying high latency clients.
It is important to note that latency results we will report are based on passive measurements from, essentially, the entire Internet. For this experiment we do not have control over the client side — where they are, which software, their configuration, their network congestion. This is significantly different from latency studied in any active measurement infrastructure such as Planet Lab, RIPE Atlas, Thousand Eyes, Catchpoint, etc.
Dr. Yannis Labrou is Principal Data Architect at Verisign Labs where he leads efforts to create value from the wealth of data that Verisign’s operations generate every day. He brings to Verisign 20 years of experience in conceiving, creating and bringing to fruition innovations; combining thinking big with laboring through the pains of materializing ideas. He has done so in an academic environment, at a startup company, while conducting government and DoD/DARPA sponsored research and for a global Fortune 200 company.
Before joining Verisign, Dr. Labrou was a Senior Researcher at Fujitsu Laboratories of America, Director of Technology and member of the executive staff of PowerMarket, an enterprise application software start-up company and a Research Assistant Professor at UMBC. He received his Ph.D. in Computer Science from UMBC, where his research focused on software agents, and a Diploma in Physics from the University of Athens, Greece. He has authored more than 40 peer-reviewed publications, with almost 4000 citations and he has been awarded 14 patents from the USPTO. His current research focus is data through the entire lifecycle from generation to monetization.
— more information and directions: http://bit.ly/UMBCtalks —
Prajit Kumar Das, 12:33am 27 January 2015
In this post we will talk about certain User Interface (UI) technological advances that we are observing at the moment. One such development was revealed in a recent media event conducted by Microsoft, where they announced the Microsoft HoloLens, a computing platform which achieves seamless connection between the digital and the physical world, quite similar to the experience referred to in certain movies in the past.
It is interesting to note that the design of the HoloLens device looks so similar to something we have seen before.
Even the vision of holographic computing and users interacting with such interfaces isn’t a new one. The 2002 movie “The first $20 million is always the hardest” was possibly the first time we saw how such a futuristic technology might look like.
How did we reach here? A brief discussion on UIs…
User interfaces have always been an important aspect of computers. In its early days computers had a monochromatic screen (or at-most a duo-chromatic screen). A user would type in commands into the screen and computers would execute said commands. Since the commands would be entered in a single or a series of lines, this interface was called the Command-Line Interface (CLI).
Command Line based UI
Such an interface was not particularly intuitive as you had to know the list of commands that would fulfill a certain task. Albeit a certain group of individuals i.e. geeks and some computer programmers, like me, prefer such an interface owing to its clean and distraction free nature. However, owing to the learning curve of CLIs, researchers at Stanford Research Institute and Xerox PARC research center invented a new User interface called the Graphical User Interface (GUI). There were a few variations of the GUIs for example the point and click type also known as WIMP (windows, icons, menus, pointer) UI created at the Xerox PARC research center and made popular by Apple through it’s Macintosh operating systems
Apple’s Macintosh UI
And also adopted by Microsoft in its Windows operating systems
Microsoft’s Windows UI
Some early versions even included a textual user interface with programs which had menus that could be parsed using a keyboard instead of a mouse.
Early textual menu based UI
Eventually new avenues were created for UI research. Continuing onwards from textual interfaces to the WIMP interfaces to the world wide web where objects on the web became entities accessible through a Uniform Resource Identifier (URI). Such an entity could possibly have Semantics associated with them too (as defined by Web 2.0). However, with the advent of mobile smart-phones we saw a completely different class of user interfaces. The touch-based user interfaces and its more evolved cousin the multi-touch systems which allowed gesture based interactions.
Touch and gesture based UI
This was the first time in computing history that humans were able to directly interact with an object on their device with their hands instead of using an input device. The experience was immersive but yet these objects had not entered into the real world. We were on precipice of a revolution in computing.
This revolution was the mainstream launch of Wearable Technology and Virtual/Augmented Reality and Optical Head Mounted Display devices with the creation of devices like the Oculus Rift, Google Glass and EyeTap among others. These devices allowed voice inputs and created a virtual or an augmented reality world for it’s user. Microsoft too was working on gesture based interactions with the Kinect device and research in the Natural User Interface (NUI) field. Couple of interesting works worthy of taking a look from this revolution are listed below.
This talk by John Underkoffler demos a UI that we saw in the movie Minority Report. He talks about the spatial aspect of how humans interact with their world and how computers might be able to help us better if we could do the same with our computers.
Here Pranav Mistry, currently the Head of the Think Tank Team and Director of Research of Samsung Research America, speaks of SixthSense. A new paradigm in computing that allows interaction between the real world and the digital world. All these works were knocking on the doors of a computer as we saw in the 2002 movie mentioned earlier, a real life holographic computer. Enter Microsoft HoloLens!
What is Microsoft HoloLens?
Microsoft HoloLens is an augmented reality computing platform. As per the review from Forbes.com this device has taken a step beyond current work by adding to the world around its user, virtual holograms, rather than putting the user in a completely virtual environment. This device has launched a new platform of software development, i.e. Holographic apps. As well as, the device has created a scope for hardware research and development, as it requires new components like the Holographic Processing Unit or HPU. Visualization and sharing of ideas and interaction with the real world can now be done as envisioned in the TED talk by Pranav Mistry. A more natural way of interacting with digital content as envisioned in the works above are a reality now. The device tracks its user’s movements in an environment. It detects what a person is looking at and transforms the visual field by overlaying 3D objects on top of that.
What kind of applications can we expect to be developed for HoloLens?
When the touch UI became a reality developers had to change the way they worked on software. Direct object interactions as shown above had to be programmed into their applications. Apps for HoloLens would similarly need to handle use-cases of interactions involving voice commands and gesture recognition. The common ideas and their corresponding research implication that come to mind include:
- Looking up a grocery list when you enter the grocery store (context aware)
HoloLens Environment overlaid with lists
- Recording important events automatically (context aware computing)
- Recognizing people in a party (social media and privacy)
- Taking down notes, writing emails using voice commands (natural language understanding)
- Searching for “stuff” around us (nlp, data analytics, semantic web, context aware computing)
- Playing 3D games (animation and graphics)
HoloLens Environment overlaid with 3D Games
- Making sure your battery doesn’t run out (systems, hardware)
- Virtual work environments (systems)
Virtual Work Environments through HoloLens
- Teaching virtual classrooms (systems)
Why or how could it fail?
Are there any obvious pitfalls that we are not thinking about? We can be rest assured that researchers are already looking at ways this venture can fail and for Microsoft’s own good we can be certain they have a list of ways they think this might go and if there are any flaws they are surely working on fixing them. However, as a researcher in the mobile field with a bit of experience with the Google Glass, we can try to list some of the possible pitfalls of a AR/VR device. The HoloLens being a tetherless, Augmented Virtual Reality (AVR) device could possibly suffer from some of these pitfalls too. The reader should understand that we are not claiming any of the following to be scientifically provable because these are merely empirical observations.
- The first thing that worried us while using the Google Glass was that it would sometimes cause us headaches after using it for couple of hours. We have not researched the implications of using the device by any other person so this is and observation from experience. Therefore one concern could be regarding the health impact on a human being with prolonged usage of an AVR device.
- The second thing that was noticed with the Google Glass was how that the device heated up fast. We know from experience that computers do get hot. For example when we play a game they get hot or we do a lot of complex computations they get hot. An AVR device which is being used for playing games will most probably get hot too. At least the Google Glass did after recording a video. Here we are concerned about the heat dissipation and its health impact on the user.
- The third observation that we made was that the Google Glass, showed significant sluggishness when it tried to accomplish computation heavy tasks. Will the HoloLens device be able to keep up with all the computations needed for, say, playing a 3D game?
- The fourth concern is regarding battery capacity. The HoloLens is advertised as a device with no wires, cords or tethers. Anyone who has used a smartphone ever knows the issues of the battery on the devices running out within a day or even half a day. Will the HoloLens be able to carry a charge for long or will it require constant charging?
- The fifth concern that we had was regarding privacy. The Google Glass has faced quite a few privacy concerns because it can readily take pictures using a simple voice command or even a non-verbal command like a ‘wink’. We have worked on this issue as part of our research product FaceBlock. Will the HoloLens create such concerns as this device too has front facing cameras that are capturing a user’s environment and projecting an augmented virtual world to the user.
The above lists of possible issues and probable application areas are not exhaustive in anyway. There will be numerous other scenarios and ways we can work on this new computing platform. There will probably be a multitude of issues with such a new and revolutionary platform. However, the hybrid of augmented and virtual reality has just started taking small steps now. With invention of devices like the Microsoft HoloLens, Google Glass, Oculus Rift, EyeTap etc. we can look forward to an exciting period in the future of Computing for Augmented Virtual Reality.
Tim Finin, 12:32pm 25 January 2015
The fourth Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL) will he held at JHU this coming Friday, January 30. It’s a good opportunity to sample current research on language technology and machine learning, including the work of a number of UMBC students. The program for the one-day colloquium includes oral presentations, poster sessions, a panel and three breakout sessions.
The event is free and open to all, but registration is requested by Tuesday, January 27. Note that the location has been moved to the Glass Pavilion on the JHU Homewood Campus
Tim Finin, 11:40pm 20 January 2015
UMBC CSEE alumni Don Miner and Brandon Wilson have started a Meetup group for Hadoop users in and around the Baltimore area to discuss Hadoop technology and use cases.
Apache Hadoop is one of the most popular open-source tools used to harness clusters of computers to process, analyze or learn from massive amounts of data. Whether you are new to Hadoop or an experienced user, this is a great opportunity to improve your knowledge and network with others in the Baltimore computing technology community.
The first meeting will be held from 7:00pm to 9:30pm on Thursday, 19 February 2015 at AOL/Advertising.com at 1020 Hull St #100, Baltimore, MD (map). Join the group here.
Tim Finin, 11:43am 17 January 2015
Facebook’s AI Research (FAIR) group has released open-source, optimized deep-learning modules for their open sourced Torch development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets.
The release includes GPU-optimized modules for large convolutional nets and networks with sparse activations that are commonly used in NLP applications.
See fbcunn for installation instructions, documentation and examples to train classifiers and iTorch for an IPython Kernel for Torch.