Web/Data Mining and Personalization
September 1, 1999 - May 1, 2001
The evolution of the Internet into the Global Information Infrastructure, coupled with the immense popularity of the Web, has also enabled the ordinary citizen to become not just a consumer of information, but also its disseminator. The Web, then, is becoming the apocryphal Vox Populi. Given that there is this vast and ever growing amount of information, how does the average user quickly find what s/he is looking for -- a task in which the present day search engines don't seem to help much!
One possible approach is to personalize the web space -- create a system which responds to user queries by potentially aggregating information from several sources in a manner which is dependent on who the user is.
Existing commercial systems seek to do some minimal personalization based on declarative information directly provided by the user, such as their zip code, or keywords describing their interests, or specific URLs, or even particular pieces of information they are interested in (e.g. price for a particular stock). Our research aims at creating systems that (semi) automatically tailor the content delivered to the user from a web site. We do so by mining the web -- both the contents, as well as the users' interaction.
Web Mining and Personalization requires modeling of an unknown number of overlapping sets in the presence of significant noise and outliers, (i. e., bad exemplars). Moreover, the data sets in Web Mining are extremely large. The aim of our reserach is to develop scalable robust fuzzy techniques to model noisy data sets containing an unknown number of overlapping categories. Specifically, in this work we are :
- Developing new scalable robust fuzzy clustering techniques for modeling data
- Exploring new techniques to handle linguistic and textual features
- Validating our techniques by creating prototype web mining and personalization systems
- T. Kamdar and A. Joshi, "On Creating Adaptive Web Servers Using Weblog Mining", TechReport, University of Maryland, Baltimore County, November 2000, 3761 downloads.
- Z. Jiang, A. Joshi, R. Krishnapuram, and L. Yi, "Retriever: Improving Web Search Engine Results Using Clustering", TechReport, University of Maryland Baltimore County, October 2000, 3428 downloads.
- A. Joshi, K. P. Joshi, and R. Krishnapuram, "On Mining Web Access Logs", TechReport, University of Maryland Baltimore County, October 1999, 3879 downloads.
- O. Nasraoui, R. Krishnapuram, and A. Joshi, "Relational Clustering Based on a New Robust Estimator with Application to Web Mining", InProceedings, North American Fuzzy Info. Proc. Society (NAFIPS 99), October 1999, 3320 downloads, 44 citations.