UMBC ebiquity
UMBC eBiquity Blog

Rapalytics! Where Rap Meets Data Science

Tim Finin, 4:34pm 14 September 2014

UMBC Ebiquity Research Meeting

Rapalytics! Where Rap Meets Data Science

Abhay Kashyap

10:00am Wednesday, Sept. 17, 2014, ITE 346

For the Hip-Hop Fans: Remember the times when you had those long arguments with your friends about who the better rapper is? Remember how it always ended up in a stalemate because there was no evidence to back your argument? Well, look no further! Rapalytics is a one-stop site dedicated to extracting and presenting all the important analytics from Rap lyrics that separate a good rapper from a great one!

For the Data Science Nerds: Remember how indestructible your trained NLP tools were? Want to see how they act under pressure from text they have never seen before? Come take a look at how traditional NLP tools fair against text as complex as Rap and explore opportunities to design and build systems that handle much more than well-formed English text.


 

Kelvin: Extracting Knowledge from Large Text Collections

Tim Finin, 8:59pm 8 September 2014

Preprint: James Mayfield, Paul McNamee, Craig Harman, Tim Finin and Dawn Lawrie, KELVIN: Extracting Knowledge from Large Text Collections, AAAI Fall Symposium on Natural Language Access to Big Data, 2014.

We describe the \kelvin system for extracting entities and relations from large text collections and its use in the TAC Knowledge Base Population Cold Start task run by the U.S. National Institute of Standards and Technology. The Cold Start task starts with an empty knowledge based defined by an ontology or entity types, properties and relations. Evaluations in 2012 and 2013 were done using a collection of text from local Web and news to de-emphasize the linking entities to a background knowledge bases such as Wikipedia. Interesting features of \kelvin include a cross-document entity coreference module based on entity mentions, removal of suspect intra-document conference chains, a slot value consolidator for entities, the application of inference rules to expand the number of asserted facts and a set of analysis and browsing tools supporting development.


 

Preprint: Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports

Tim Finin, 5:38am 17 July 2014

clinicalTable3500

Varish Mulwad, Tim Finin and Anupam Joshi, Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports, 15th IEEE Int. Conf. on Information Reuse and Integration, Aug 2014.

Evidence-based medicine is the application of current medical evidence to patient care and typically uses quantitative data from research studies. It is increasingly driven by data on the efficacy of drug dosages and the correlations between various medical factors that are assembled and integrated through meta–analyses (i.e., systematic reviews) of data in tables from publications and clinical trial studies. We describe a important component of a system to automatically produce evidence reports that performs two key functions: (i) understanding the meaning of data in medical tables and (ii) identifying and retrieving relevant tables given a input query. We present modifications to our existing framework for inferring the semantics of tables and an ontology developed to model and represent medical tables in RDF. Representing medical tables as RDF makes it easier for the automatic extraction, integration and reuse of data from multiple studies, which is essential for generating meta–analyses reports. We show how relevant tables can be identified by querying over their RDF representations and describe two evaluation experiments: one on mapping medical tables to linked data and another on identifying tables relevant to a retrieval query.


 

:BaseKB offered as a better Freebase version

Tim Finin, 2:49pm 15 July 2014

:BaseKB

In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.

:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.

:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.

It looks like it’s worth checking out!


 

TISA Topic Independence Scoring Algorithm

Tim Finin, 10:11am 23 June 2014

Justin Martineau, Doreen Cheng and Tim Finin, TISA: topic independence scoring algorithm. In Proc. 9th Int. Conf. on Machine Learning and Data Mining (MLDM’13), pp. 555-570, July 2013, Springer-Verlag.

Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.

To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.


 

Ebiquity alumna Lalana Kagal featured for privacy work

Tim Finin, 12:27pm 15 June 2014

Congratulations to ebiquity alumna Lalana Kagal (Ph.D. 2004) for being featured on MIT’s home page recently for recent work with Ph.D. student Oshani Seneviratne on enabling people to track how their private data is used online. You can read more about their work via this MIT news item and in their paper Enabling Privacy Through Transparency which will be presented next month in the 2014 IEEE Privacy Security and Trust conference.


 

Do not be a Gl***hole, use Face-Block.me!

Prajit Kumar Das, 1:13pm 27 March 2014

If you are a Google Glass user, you might have been greeted with concerned looks or raised eyebrows at public places. There has been a lot of chatter in the “interweb” regarding the loss of privacy that results from people taking your pictures with Glass without notice. Google Glass has simplified photography but as what happens with revolutionary technology people are worried about the potential misuse.

FaceBlock helps to protect the privacy of people around you by allowing them to specify whether or not to be included in your pictures. This new application developed by the joint collaboration between researchers from the Ebiquity Research Group at University of Maryland, Baltimore County and Distributed Information Systems (DIS) at University of Zaragoza (Spain), selectively obscures the face of the people in pictures taken by Google Glass.

Comfort at the cost of Privacy?

As the saying goes, “The best camera is the one that’s with you”. Google Glass suits this description as it is always available and can take a picture with a simple voice command (“Okay Glass, take a picture”). This allows users to capture spontaneous life moments effortlessly. On the flip side, this raises significant privacy concerns as pictures can taken without one’s consent. If one does not use this device responsibly, one risks being labelled a “Glasshole”. Quite recently, a Google Glass user was assaulted by the patrons who objected against her wearing the device inside the bar. The list of establishments which has banned Google Glass within their premises is growing day by day. The dos and donts for Glass users released by Google is a good first step but it doesn’t solve the problem of privacy violation.

FaceBlock_Image_Google_Glass

Privacy-Aware pictures to the rescue

FaceBlock takes regular pictures taken by your smartphone or Google Glass as input and converts it into privacy-aware pictures. This output is generated by using a combination of Face Detection and Face Recognition algorithms. By using FaceBlock, a user can take a picture of herself and specify her policy/rule regarding pictures taken by others (in this case ‘obscure my face in pictures from strangers’). The application would automatically generate a face identifier for this picture. The identifier is a mathematical representation of the image. To learn more about the working on FaceBlock, you should watch the following video.

Using Bluetooth, FaceBlock can automatically detect and share this policy with Glass users near by. After receiving this face identifier from a nearby user, the following post processing steps happen on Glass as shown in the images.

FaceBlock_Image_Eigen_UncheckFaceBlock_Image_Eigen_CheckFaceBlock_Image_Blur

What promises does it hold?

FaceBlock is a proof of concept implementation of a system that can create privacy-aware pictures using smart devices. The pervasiveness of privacy-aware pictures could be a right step towards balancing privacy needs and comfort afforded by technology. Thus, we can get the best out of Wearable Technology without being oblivious about the privacy of those around you.

FaceBlock is part of the efforts of Ebiquity and SID in building systems for preserving user privacy on mobile devices. For more details, visit http://face-block.me


 

Google MOOC: Making Sense of Data

Tim Finin, 11:18pm 26 February 2014

Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).

Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data


 

Stardog unleashed: MD Semantic Web Meeup, 6pm Thr 2/27

Tim Finin, 1:12pm 26 February 2014

The next Central MD Semantic Web Meetup will be held at 6:00pm on Thursday, February 27, 2014 at Inovex Information Systems (7240 Parkway Dr., Suite 140, Hanover MD). Michael Grove, the Chief Software Architect at Clark & Parsia, will talk on their Stardog triple store technology. The meetup is a good way to meet and network with others working on or with semantic technologies in Maryland.

“Stardog Unleashed will provide some background on the motivation for building Stardog, as well a short review of its history and unique feature set We will also provide an overview and demo of Stardog Web; a Javascript framework for building web applications backed by semantic technologies.

Our speaker, Michael Grove, is the Chief Software Architect at Clark & Parsia, where he also serves as the lead developer of Stardog, the leader in RDF databases featuring fast query performance and unmatched OWL & SWRL support.

A graduate in Computer Science at the University of Maryland, College Park, Michael first got started with semantic technologies in 2002 as a research assistant under Dr. Jim Hendler at the University of Maryland with the MINDSWAP group. Before joining the team at Clark & Parsia, he worked at Fujitsu Research Labs as the lead developer for the Task Computing project, an effort bring the semantic web to pervasive computing environments.

Michael is also active in open source where he is a contributor to Pellet the leading OWL DL reasoner and maintains Empire, an implementation of JPA backed by semantic technologies. Additionally, he is contributor to the Sesame project and active on the Jena development list.”


 

Tracking Provenance and Reproducibility of Big Data Experiments

Tim Finin, 11:53am 8 February 2014

In the first Ebiquity meeting of the semester, Vlad Korolev will talk about his work on using RDF for to capture, represent and use provenance information for big data experiments.

PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments

10-11:30am, ITE346, UMBC

Reproducibility of computations and data provenance are very important goals to achieve in order to improve the quality of one’s research. Unfortunately, despite some efforts made in the past, it is still very hard to reproduce computational experiments with high degree of certainty. The Big Data phenomenon in recent years makes this goal even harder to achieve. In this work, we propose a tool that aids researchers to improve reproducibility of their experiments through automated keeping of provenance records.


 

Jan 30 Ontology Summit: Tools, Services, and Techniques

Tim Finin, 10:34am 30 January 2014

Today’s online meeting (Jan 30, 12:30-2:30 EST) in the 2014 Ontology Summit series is part of the Tools, Services, and Techniques track and features presentations by

  • Dr. ChrisWelty (IBM Research) on “Inside the Mind of Watson – a Natural Language Question Answering Service Powered by the Web of Data and Ontologies”
  • Prof. AlanRector (U. Manchester) on “Axioms & Templates: Distinctions and Transformations amongst Ontologies, Frames, & Information Models
  • Professor TillMossakowski (U. Magdeburg) on “Challenges in Scaling Tools for Ontologies to the Semantic Web: Experiences with Hets and OntoHub”

Audio via phone (206-402-0100) or Skype. See the session page for details and access to slides.


 

Ontology Summit: Use and Reuse of Semantic Content

Tim Finin, 8:48am 23 January 2014

The first online session of the 2014 Ontology Summit on “Big Data and Semantic Web Meet Applied Ontology” takes place today (Thurday January 23) from 12:30pm to 2:30pm (EST, UTC-5) with topic Common Reusable Semantic Content — The Problems and Efforts to Address Them. The session will include four presentations:

followed by discussion.

Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.