| UMBC ebiquity |
Why You Should Use N-grams for Multilingual Information RetrievalTweetSpeaker: Paul McNamee Start: Wednesday, October 18, 2006, 11:00AM End: Wednesday, October 18, 2006, 12:00PM Location: 2120 A.V. Williams Building, UMCP Abstract: While generally accepted for languages such as Chinese and Japanese, the use of character n-gram tokenization has not been widely adopted for information retrieval in alphabetic languages. However, n-grams are a simple representation for text that is surprisingly effective in diverse languages. In this talk I present empirical results in twelve European languages that have been studied in the Cross Language Evaluation Forum (CLEF) competitions. These results demonstrate that:
|