UMBC ebiquity
UMBC eBiquity Blog

Infoboxer: using statistical semantic knowledge to help create Wikipedia infoboxes

Tim Finin, 7:56pm 29 September 2014


In this week’s ebiquity meeting (10am Tue. Oct 1 in ITE346), Varish Mulwad will present Infoboxer, a prototype tool he developed with Roberto Yus that overcomes these challenges using statistical and semantic knowledge from linked data sources to ease the process of creating Wikipedia infoboxes.

Wikipedia infoboxes serve as input in the creation of knowledge bases
such as DBpedia, Yago, and Freebase. Current creation of Wikipedia
infoboxes is manual and based on templates that are created and
maintained collaboratively. However, these templates pose several
challenges:

  • Different communities use different infobox templates for the same category articles
  • Attribute names differ (e.g., date of birth vs. birthdate)
  • Templates are restricted to a single category, making it harder to find a template for an article that belongs to multiple categories (e.g., actor and politician)
  • Templates are free form in nature and no integrity check is performed on whether the value filled by the user is of appropriate type for the given attribute

Infoboxer creates dynamic and semantic templates by suggesting attributes common for similar articles and controlling the expected values semantically. We will give an overview of our approach and demonstrate how Infoboxer can be used to create infoboxes for new Wikipedia articles as well as update erroneous values in existing infoboxes. We will also discuss our proposed extensions to the project.

Visit http://ebiq.org/p/668 for more information about Infoboxer. A demo can be found here.


 

Rafiki: A Semantic and Collaborative Approach to Community Health-Care in Underserved Areas

Tim Finin, 7:26am 19 September 2014

rafike500

Primal Pappachan, Roberto Yus, Anupam Joshi and Tim Finin, Rafiki: A Semantic and Collaborative Approach to Community Health-Care in Underserved Areas, 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, 22-15 October2014, Miami.

Community Health Workers (CHWs) act as liaisons between health-care providers and patients in underserved or un-served areas. However, the lack of information sharing and training support impedes the effectiveness of CHWs and their ability to correctly diagnose patients. In this paper, we propose and describe a system for mobile and wearable computing devices called Rafiki which assists CHWs in decision making and facilitates collaboration among them. Rafiki can infer possible diseases and treatments by representing the diseases, their symptoms, and patient context in OWL ontologies and by reasoning over this model. The use of semantic representation of data makes it easier to share knowledge related to disease, symptom, diagnosis guidelines, and patient demography, between various personnel involved in health-care (e.g., CHWs, patients, health-care providers). We describe the Rafiki system with the help of a motivating community health-care scenario and present an Android prototype for smart phones and Google Glass.


 

Taming Wild Big Data

Tim Finin, 8:36pm 17 September 2014

Jennifer Sleeman and Tim Finin, Taming Wild Big Data, AAAI Fall Symposium on Natural Language Access to Big Data, Nov. 2014.

Wild Big Data is data that is hard to extract, understand, and use due to its heterogeneous nature and volume. It typically comes without a schema, is obtained from multiple sources and provides a challenge for information extraction and integration. We describe a way to subduing Wild Big Data that uses techniques and resources that are popular for processing natural language text. The approach is applicable to data that is presented as a graph of objects and relations between them and to tabular data that can be transformed into such a graph. We start by applying topic models to contextualize the data and then use the results to identify the potential types of the graph’s nodes by mapping them to known types found in large open ontologies such as Freebase, and DBpedia. The results allow us to assemble coarse clusters of objects that can then be used to interpret the link and perform entity disambiguation and record linking.


 

Rapalytics! Where Rap Meets Data Science

Tim Finin, 4:34pm 14 September 2014

UMBC Ebiquity Research Meeting

Rapalytics! Where Rap Meets Data Science

Abhay Kashyap

10:00am Wednesday, Sept. 17, 2014, ITE 346

For the Hip-Hop Fans: Remember the times when you had those long arguments with your friends about who the better rapper is? Remember how it always ended up in a stalemate because there was no evidence to back your argument? Well, look no further! Rapalytics is a one-stop site dedicated to extracting and presenting all the important analytics from Rap lyrics that separate a good rapper from a great one!

For the Data Science Nerds: Remember how indestructible your trained NLP tools were? Want to see how they act under pressure from text they have never seen before? Come take a look at how traditional NLP tools fair against text as complex as Rap and explore opportunities to design and build systems that handle much more than well-formed English text.


 

Kelvin: Extracting Knowledge from Large Text Collections

Tim Finin, 8:59pm 8 September 2014

Preprint: James Mayfield, Paul McNamee, Craig Harman, Tim Finin and Dawn Lawrie, KELVIN: Extracting Knowledge from Large Text Collections, AAAI Fall Symposium on Natural Language Access to Big Data, 2014.

We describe the \kelvin system for extracting entities and relations from large text collections and its use in the TAC Knowledge Base Population Cold Start task run by the U.S. National Institute of Standards and Technology. The Cold Start task starts with an empty knowledge based defined by an ontology or entity types, properties and relations. Evaluations in 2012 and 2013 were done using a collection of text from local Web and news to de-emphasize the linking entities to a background knowledge bases such as Wikipedia. Interesting features of \kelvin include a cross-document entity coreference module based on entity mentions, removal of suspect intra-document conference chains, a slot value consolidator for entities, the application of inference rules to expand the number of asserted facts and a set of analysis and browsing tools supporting development.


 

Preprint: Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports

Tim Finin, 5:38am 17 July 2014

clinicalTable3500

Varish Mulwad, Tim Finin and Anupam Joshi, Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports, 15th IEEE Int. Conf. on Information Reuse and Integration, Aug 2014.

Evidence-based medicine is the application of current medical evidence to patient care and typically uses quantitative data from research studies. It is increasingly driven by data on the efficacy of drug dosages and the correlations between various medical factors that are assembled and integrated through meta–analyses (i.e., systematic reviews) of data in tables from publications and clinical trial studies. We describe a important component of a system to automatically produce evidence reports that performs two key functions: (i) understanding the meaning of data in medical tables and (ii) identifying and retrieving relevant tables given a input query. We present modifications to our existing framework for inferring the semantics of tables and an ontology developed to model and represent medical tables in RDF. Representing medical tables as RDF makes it easier for the automatic extraction, integration and reuse of data from multiple studies, which is essential for generating meta–analyses reports. We show how relevant tables can be identified by querying over their RDF representations and describe two evaluation experiments: one on mapping medical tables to linked data and another on identifying tables relevant to a retrieval query.


 

:BaseKB offered as a better Freebase version

Tim Finin, 2:49pm 15 July 2014

:BaseKB

In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.

:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.

:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.

It looks like it’s worth checking out!


 

TISA Topic Independence Scoring Algorithm

Tim Finin, 10:11am 23 June 2014

Justin Martineau, Doreen Cheng and Tim Finin, TISA: topic independence scoring algorithm. In Proc. 9th Int. Conf. on Machine Learning and Data Mining (MLDM’13), pp. 555-570, July 2013, Springer-Verlag.

Textual analysis using machine learning is in high demand for a wide range of applications including recommender systems, business intelligence tools, and electronic personal assistants. Some of these applications need to operate over a wide and unpredictable array of topic areas, but current in-domain, domain adaptation, and multi-domain approaches cannot adequately support this need, due to their low accuracy on topic areas that they are not trained for, slow adaptation speed, or high implementation and maintenance costs.

To create a true domain-independent solution, we introduce the Topic Independence Scoring Algorithm (TISA) and demonstrate how to build a domain-independent bag-of-words model for sentiment analysis. This model is the best preforming sentiment model published on the popular 25 category Amazon product reviews dataset. The model is on average 89.6% accurate as measured on 20 held-out test topic areas. This compares very favorably with the 82.28% average accuracy of the 20 baseline in-domain models. Moreover, the TISA model is highly uniformly accurate, with a variance of 5 percentage points, which provides strong assurance that the model will be just as accurate on new topic areas. Consequently, TISAs models are truly domain independent. In other words, they require no changes or human intervention to accurately classify documents in never before seen topic areas.


 

Ebiquity alumna Lalana Kagal featured for privacy work

Tim Finin, 12:27pm 15 June 2014

Congratulations to ebiquity alumna Lalana Kagal (Ph.D. 2004) for being featured on MIT’s home page recently for recent work with Ph.D. student Oshani Seneviratne on enabling people to track how their private data is used online. You can read more about their work via this MIT news item and in their paper Enabling Privacy Through Transparency which will be presented next month in the 2014 IEEE Privacy Security and Trust conference.


 

Do not be a Gl***hole, use Face-Block.me!

Prajit Kumar Das, 1:13pm 27 March 2014

If you are a Google Glass user, you might have been greeted with concerned looks or raised eyebrows at public places. There has been a lot of chatter in the “interweb” regarding the loss of privacy that results from people taking your pictures with Glass without notice. Google Glass has simplified photography but as what happens with revolutionary technology people are worried about the potential misuse.

FaceBlock helps to protect the privacy of people around you by allowing them to specify whether or not to be included in your pictures. This new application developed by the joint collaboration between researchers from the Ebiquity Research Group at University of Maryland, Baltimore County and Distributed Information Systems (DIS) at University of Zaragoza (Spain), selectively obscures the face of the people in pictures taken by Google Glass.

Comfort at the cost of Privacy?

As the saying goes, “The best camera is the one that’s with you”. Google Glass suits this description as it is always available and can take a picture with a simple voice command (“Okay Glass, take a picture”). This allows users to capture spontaneous life moments effortlessly. On the flip side, this raises significant privacy concerns as pictures can taken without one’s consent. If one does not use this device responsibly, one risks being labelled a “Glasshole”. Quite recently, a Google Glass user was assaulted by the patrons who objected against her wearing the device inside the bar. The list of establishments which has banned Google Glass within their premises is growing day by day. The dos and donts for Glass users released by Google is a good first step but it doesn’t solve the problem of privacy violation.

FaceBlock_Image_Google_Glass

Privacy-Aware pictures to the rescue

FaceBlock takes regular pictures taken by your smartphone or Google Glass as input and converts it into privacy-aware pictures. This output is generated by using a combination of Face Detection and Face Recognition algorithms. By using FaceBlock, a user can take a picture of herself and specify her policy/rule regarding pictures taken by others (in this case ‘obscure my face in pictures from strangers’). The application would automatically generate a face identifier for this picture. The identifier is a mathematical representation of the image. To learn more about the working on FaceBlock, you should watch the following video.

Using Bluetooth, FaceBlock can automatically detect and share this policy with Glass users near by. After receiving this face identifier from a nearby user, the following post processing steps happen on Glass as shown in the images.

FaceBlock_Image_Eigen_UncheckFaceBlock_Image_Eigen_CheckFaceBlock_Image_Blur

What promises does it hold?

FaceBlock is a proof of concept implementation of a system that can create privacy-aware pictures using smart devices. The pervasiveness of privacy-aware pictures could be a right step towards balancing privacy needs and comfort afforded by technology. Thus, we can get the best out of Wearable Technology without being oblivious about the privacy of those around you.

FaceBlock is part of the efforts of Ebiquity and SID in building systems for preserving user privacy on mobile devices. For more details, visit http://face-block.me


 

Google MOOC: Making Sense of Data

Tim Finin, 11:18pm 26 February 2014

Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).

Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data


 

Stardog unleashed: MD Semantic Web Meeup, 6pm Thr 2/27

Tim Finin, 1:12pm 26 February 2014

The next Central MD Semantic Web Meetup will be held at 6:00pm on Thursday, February 27, 2014 at Inovex Information Systems (7240 Parkway Dr., Suite 140, Hanover MD). Michael Grove, the Chief Software Architect at Clark & Parsia, will talk on their Stardog triple store technology. The meetup is a good way to meet and network with others working on or with semantic technologies in Maryland.

“Stardog Unleashed will provide some background on the motivation for building Stardog, as well a short review of its history and unique feature set We will also provide an overview and demo of Stardog Web; a Javascript framework for building web applications backed by semantic technologies.

Our speaker, Michael Grove, is the Chief Software Architect at Clark & Parsia, where he also serves as the lead developer of Stardog, the leader in RDF databases featuring fast query performance and unmatched OWL & SWRL support.

A graduate in Computer Science at the University of Maryland, College Park, Michael first got started with semantic technologies in 2002 as a research assistant under Dr. Jim Hendler at the University of Maryland with the MINDSWAP group. Before joining the team at Clark & Parsia, he worked at Fujitsu Research Labs as the lead developer for the Task Computing project, an effort bring the semantic web to pervasive computing environments.

Michael is also active in open source where he is a contributor to Pellet the leading OWL DL reasoner and maintains Empire, an implementation of JPA backed by semantic technologies. Additionally, he is contributor to the Sesame project and active on the Jena development list.”