The KELVIN Information Extraction System

October 30th, 2015

In this week’s ebiquity lab meeting (10:30am Monday Nov 2), Tim Finin will describe recent work on the Kelvin information extraction system and its performance in two tasks in the 2015 NIST Text Analysis Conference. Kelvin has been under development at the JHU Human Language Center of Excellence for several years. Kelvin reads documents in several languages and extracts entities and relations between them. This year it was used for the Coldstart Knowledge Base Population and Trilingual Entity Discovery and Linking tasks. Key components in the tasks are a system for cross-document coreference and another that links entities to entries in the Freebase knowledge base.

Demystifying Word2Vec: A Hands-on Tutorial

October 16th, 2015

Demystifying Word2Vec – A Hands-on Tutorial

Abhay Kashyap

10:30am Monday, 19 October 2015 **ITE 456**

In the world of NLP, Word2Vec is one of the coolest kids in town! But what exactly is it and how does it work? More importantly, how is it used/useful?

For the first 10-15 minutes, we will go over distributional an distributed representation of words and the neural language model behind Word2Vec. We will also briefly look at doc2vec, the extension of Word2Vec for longer pieces of text.

For the remainder of the time (45-60 minutes), we will get our feet wet by running Word2Vec on a dataset which will then be followed by discussions about potential ways it can be useful for your own work.

What to bring – Any computing machine with Python installed, lots of curiosity and some delicious snacks for me maybe? We will use the excellent gensim package for python to run Word2Vec along with cython to speed things up. If you aren’t familiar with Python or don’t like it, no worries! It’s really just 5-6 lines of code! The training dataset will be provided. If you wish to bring your own, that’s cool too.

NOTE: We will hold this week’s Ebiquity meeting in ITE 456.

Hot Stuff at ColdStart

June 8th, 2015

Cold Start

Coldstart is a task in the NIST Text Analysis Conference’s Knowledge Base Population suite that combines entity linking and slot filling to populate an empty knowledge base using a predefined ontology for the facts and relations. This paper describes a system developed by the Human Language Technology Center of Excellence at Johns Hopkins University for the 2014 Coldstart task.

Tim Finin, Paul McNamee, Dawn Lawrie, James Mayfield and Craig Harman, Hot Stuff at Cold Start: HLTCOE participation at TAC 2014, 7th Text Analysis Conference, National Institute of Standards and Technology, Nov. 2014.

The JHU HLTCOE participated in the Cold Start task in this year’s Text Analysis Conference Knowledge Base Population evaluation. This is our third year of participation in the task, and we continued our research with the KELVIN system. We submitted experimental variants that explore use of forward-chaining inference, slightly more aggressive entity clustering, refined multiple within-document conference, and prioritization of relations extracted from news sources.

Interactive Knowledge Base Population

June 6th, 2015

Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, Tim Finin and Benjamin Van Durme, Interactive Knowledge Base Population, arXiv:1506.00301 [cs.AI], May 2015.

Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).