Amir Karami on a fuzzy approach topic models for medical corpora

December 2nd, 2014

In this week’s Ebiquity meeting (10am Wed 12/3 in ITE346), Amir Karami will talk about “Fuzzy Approach Topic Models for Medical Corpus”.

Abstract: Looking for ways to automatically retrieve the enormous amount of medical knowledge has always been an intriguing topic. The massive flow of medical documents including scholarly publications and clinical notes has benefited experts by providing ease to access to a huge amount of text data. However, due to this amount of data, medical experts are finding it increasingly difficult locate information of interest. As a consequence, finding relevant documents has become more difficult. Effective text mining systems should be able to extract and exploit not only explicitly stated information but also implied and inferred data. Using bag-of-words leads to sparse high dimension problem that has low performance and needs more cost of computation. Dimension reduction techniques, specially topic models, are one of useful techniques to overcome the problems of bag-of-words. This research proposes a novel approach for topic modeling using fuzzy clustering. To evaluate our model, we experiment with two text datasets of medical documents. The evaluation metrics carried out through document classification, document modeling, and document clustering show that our approach produces better performance than LDA, the most-cited topic model article in Google scholar, indicating that fuzzy set theory can improve the performance of topic models in medical domain. Our approach solves redundancy issue in medical domain and can discover the relation between topics in a documents. In addition, the previous research of fuzzy clustering can help to solve the challenges of topic modeling such as defining the number of topics.