| Building intelligent systems in open, heterogeneous, dynamic, distributed environments | 16 May 2008, 08:24:31 EDT ![]() |
|||
Why You Should Use N-grams for Multilingual Information Retrieval Title: Why You Should Use N-grams for Multilingual Information Retrieval Speaker: Paul McNamee Start Date: Wednesday, October 18, 2006, 11:00AM End Date: Wednesday, October 18, 2006, 12:00PM Location: 2120 A.V. Williams Building, UMCP Abstract: While generally accepted for languages such as Chinese and Japanese, the use of character n-gram tokenization has not been widely adopted for information retrieval in alphabetic languages. However, n-grams are a simple representation for text that is surprisingly effective in diverse languages. In this talk I present empirical results in twelve European languages that have been studied in the Cross Language Evaluation Forum (CLEF) competitions. These results demonstrate that:
|
| Home | About Us | Contact Us | Site Map | Legal | Privacy Copyright © 1999-2008 UMBC ebiquity research group. Copyright © 2003-2008 Site design and RGB engine code by Filip Perich. XG Page gen 0.020 sec. |