Proceedings of the 33rd International FLAIRS Conference
Gazetteer Generation for Neural Named Entity Recognition
May 17, 2020
We present a way to generate gazetteers from the Wikidata knowledge graph and use the lists to improve a neural NER system by adding an input feature indicating that a word is part of a name in the gazetteer. We empirically show that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a high-resource, character-based language, Chinese. We apply the approach to a low-resource language, Russian, using a new annotated Russian NER corpus from Reddit tagged with four core and eleven extended types, and show a baseline score.
A longer version of this paper is: Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield, Improving Neural Named Entity Recognition with Gazetteers, arXiv:2003.03072, March 2020.
Downloads: 457 downloads