Semantic Discovery: Discovering Complex Relationships in Semantic Web
October 1, 2003 - October 1, 2008
Research in search techniques was a critical component of the first generation of the Web, and has gone from academe to mainstream. A second generation Semantic Web will be built by adding semantic annotations that software can understand and from which humans can benefit. Modeling, discovering and reasoning about complex relationships on the Semantic Web will enable this vision and transform the hunt for documents into a more automated analysis enabled by semantic technology. The beginnings of this shift from search to analysis can be observed in research and industry as users look beyond finding relevant documents based on keywords to finding actionable information leading to decision making and insights. Large scale semantic annotation of data (domain-independent and domain-specific) is now possible because of an accumulation of advances in entity identification, automatic classification, taxonomy and ontology development, and metadata extraction. The next frontier, which fundamentally changes the way we acquire and use knowledge, is to automatically identify complex relationships between entities in this semantically annotated data. Instead of a search engine that returns documents containing terms of interest, we envision a system that returns actionable information (with the associated sources and supporting evidence) to a user or application. The user interacts with information universe through a hypothesis driven approach that combines search and inferencing, enabling more complex analysis and deeper insight. The examples in our narrative show that such a capability also greatly enhances the capacity of intelligence analysts to obtain (in time) information leading to a more secure homeland and world.
Our research will focus on the design, prototyping and evaluation of a system, called SemDIS (Semantic Discovery) that supports indexing and querying of complex semantic relationships and is driven by notions of information trust and provenance and models of hypotheses and arguments under investigation.
From scientific perspective, we face the challenges of formally defining and representing meaningful and interesting relationships (which we call semantic associations), and defining the notion of quality of results similar to the familiar metrics of precision, recall and document ranking. Another challenge is the (semi) automatic construction of argument structures built on these relationships to validate or deny a given hypothesis. Additional scientific and engineering challenges include those related to the scale of storing and complex query processing of large metadata sets, with corresponding more complex data structures to represent entities and relationships, the need to utilize context to select relevant subsets of metadata to process, and new techniques that use information provenance and trust to improve ranking of relationships. These challenges call for a fresh look at indexing, query processing, ranking, as well as tractable and scalable graph algorithms that exploit heuristics. Our work proposes to address these challenges building on our preliminary results in semantic metadata extraction, practical domain-specific ontology creations, defining semantic associations, main-memory query processing, using distributed trust to enforce security policies, and knowledge representation and reasoning on the semantic web. Scientific results from SemDIS will involve detailed scenarios and an evaluation testbed, and will be measured in terms of novel techniques as well as performance metrics and measures of quality, scalability and performance for computing complex semantic relationships. Corresponding to the breadth and depth of the topics involved in the challenge undertaken, ours is a collaborative proposal involving researchers at UGA and UMBC, covering the areas of information modeling and knowledge representation, storage and database management, information retrieval and artificial intelligence.
Our effort will have broader impacts beyond the education and training of graduate students, and the publication of research findings. Results from our research will be integrated with courses we teach, both existing and new. We will use institutional mechanisms in place to seek participation of students from underrepresented groups. Datasets used for testbed evaluations and some of the targeted tools will be made public or open source, and new measures for relevance and ranking of semantic associations will provide input to future work on comparing various approaches and techniques. Our work will also gain from several academic-industry collaborations of the investigators. We will have the opportunity to leverage commercial infrastructure and raw metadata provided by Semagix and IBM and, when appropriate, technology licensing will be encouraged. The researchers will collaborate with industry, and the students will be encouraged to intern at collaborating industrial labs. Within a broader social context, emerging knowledge-centric technologies raise legitimate privacy and civil liberties concerns. Building upon past policy making experience, we will comment on potential implications of our scientific progress. This research is supported in part by an NSF award ITR 0325172, and is a collaborative effort with colleagues at U. Georgia and Wright State University . Some demos associated with our efforts to find trust using DBLP and FOAF data can be found at the project's web page. We've also developed related software systems, supported by this and other awards, such as a semantic web search engine Swoogle, Community/sentiment/trust detection systems such as Feeds that matter and PolVox, amongst others.
- A. Java, A. Joshi, and T. Finin, "Approximating the Community Structure of the Long Tail", InProceedings, Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008), March 2008, 4638 downloads, 4 citations.
- A. Karandikar, A. Java, A. Joshi, T. Finin, Y. Yesha, and Y. Yesha, "Second Space: A Generative Model For The Blogosphere", InProceedings, Proceedings of the International Conference on Weblogs and Social Media (ICWSM 2008), March 2008, 3989 downloads, 1 citation.
- B. Aleman-Meza, M. Nagarajan, L. Ding, A. Sheth, B. Arpinar, A. Joshi, and T. Finin, "Scalable semantic analytics on social networks for addressing the problem of conflict of interest detection", Article, ACM Transactions on the Web, February 2008, 2512 downloads, 14 citations.
- P. Kolari, "Detecting Spam Blogs: An Adaptive Online Approach", PhdThesis, Ph.D. Dissertation, December 2007, 7319 downloads, 3 citations.
- A. Java, S. Nirenburg, M. McShane, T. Finin, J. English, and A. Joshi, "Using a Natural Language Understanding System to Generate Semantic Web Content", Article, International Journal on Semantic Web and Information Systems, November 2007, 2901 downloads, 13 citations.
- A. Joshi, T. Finin, A. Java, A. Kale, and P. Kolari, "Web 2.0 Mining: Analyzing Social Media", InProceedings, Proceedings of the NSF Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation, October 2007, 4851 downloads, 5 citations.
- O. Walavalkar, "Streaming Knowledge Bases", MastersThesis, University of Maryland, Baltimore County, August 2007, 4 citations.
- A. Karandikar, "Generative Model To Construct Blog and Post Networks In Blogosphere", MastersThesis, University of Maryland at Baltimore County, May 2007, 7662 downloads, 3 citations.
- A. Java, P. Kolari, T. Finin, A. Joshi, and T. Oates, "Feeds That Matter: A Study of Bloglines Subscriptions", InProceedings, Proceedings of the International Conference on Weblogs and Social Media (ICWSM 2007), March 2007, 5828 downloads, 17 citations.
- P. Kolari, A. Java, and A. Joshi, "Spam in Blogs and Social Media, Tutorial", InProceedings, ICWSM 2007, March 2007, 2864 downloads, 3 citations.
- A. Java, P. Kolari, T. Finin, J. Mayfield, A. Joshi, and J. Martineau, "BlogVox: Separating Blog Wheat from Blog Chaff", InProceedings, Proceedings of the Workshop on Analytics for Noisy Unstructured Text Data, 20th International Joint Conference on Artificial Intelligence (IJCAI-2007), January 2007, 3087 downloads, 8 citations.
- P. Kolari, A. Java, T. Finin, J. Mayfield, A. Joshi, and J. Martineau, "Blog Track Open Task: Spam Blog Classification", InCollection, TREC 2006 Blog Track Notebook, November 2006, 7488 downloads, 11 citations.
- P. Kolari, A. Java, T. Finin, T. Oates, and A. Joshi, "Detecting Spam Blogs: A Machine Learning Approach", InProceedings, Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), July 2006, 10308 downloads, 75 citations.
- P. Kolari, A. Java, and T. Finin, "Characterizing the Splogosphere", InProceedings, Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference, May 2006, 6687 downloads, 84 citations.
- B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, A. Sheth, B. Arpinar, L. Ding, P. Kolari, A. Joshi, and T. Finin, "Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection", InProceedings, Proceedings of the 15th International World Wide Web Conference,, May 2006, 3222 downloads, 106 citations.
- T. Finin and L. Ding, "Search Engines for Semantic Web Knowledge", InProceedings, Proceedings of XTech 2006: Building Web 2.0, May 2006, 24347 downloads, 14 citations.
- L. Ding, "Enhancing Semantic Web Data Access", PhdThesis, University of Maryland, Baltimore County, April 2006, 6014 downloads, 8 citations.
- P. Kolari, T. Finin, and A. Joshi, "SVMs for the Blogosphere: Blog Identification and Splog Detection", InProceedings, AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, March 2006, 11956 downloads, 112 citations.
- P. Kolari and T. Finin, "Memeta: A Framework for Multi-Relational Analytics on the Blogosphere", InProceedings, AAAI 2006 Student Abstract Program, February 2006, 4049 downloads.
- A. Java, T. Finin, and S. Nirenburg, "SemNews: A Semantic News Framework", InProceedings, Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), February 2006, 4897 downloads, 15 citations.
- A. Java, T. Finin, and S. Nirenburg, "Text understanding agents and the Semantic Web", InProceedings, Proceedings of the 39th Hawaii International Conference on System Sciences, January 2006, 6820 downloads, 19 citations.
- T. Finin, L. Ding, L. Zhou, and A. Joshi, "Social Networking on the Semantic Web", Article, The Learning Organization, December 2005, 7660 downloads, 2 citations.
- L. Ding, R. Pan, T. Finin, A. Joshi, Y. Peng, and P. Kolari, "Finding and Ranking Knowledge on the Semantic Web", InProceedings, Proceedings of the 4th International Semantic Web Conference, November 2005, 12429 downloads, 130 citations.
- L. Ding, T. Finin, A. Joshi, Y. Peng, P. Pinheiro da Silva, and D. L. McGuinness, "Tracking RDF Graph Provenance using RDF Molecules", InProceedings, Proceedings of the 4th International Semantic Web Conference, November 2005, 5680 downloads, 47 citations.
- L. Ding, T. Finin, A. Joshi, Y. Peng, R. Pan, and P. Reddivari, "Search on the Semantic Web", Article, IEEE Computer, October 2005, 3428 downloads, 66 citations.
- T. Finin, L. Ding, R. Pan, A. Joshi, P. Kolari, A. Java, and Y. Peng, "Swoogle: Searching for knowledge on the Semantic Web", InProceedings, AAAI 05 (intelligent systems demo), July 2005, 5092 downloads, 35 citations.
- P. Kolari, L. Ding, S. Ganjugunte, L. Kagal, A. Joshi, and T. Finin, "Enhancing Web Privacy Protection through Declarative Policies", InProceedings, Proceedings of the IEEE Workshop on Policy for Distributed Systems and Networks(POLICY 2005), June 2005, 6883 downloads, 27 citations.
- L. Ding, P. Kolari, T. Finin, A. Joshi, Y. Peng, and Y. Yesha, "On Homeland Security and the Semantic Web: A Provenance and Trust Aware Inference Framework", InProceedings, Proceedings of the AAAI SPring Symposium on AI Technologies for Homeland Security, March 2005, 3736 downloads, 32 citations.
- L. Ding, L. Zhou, T. Finin, and A. Joshi, "How the Semantic Web is Being Used:An Analysis of FOAF Documents", InProceedings, Proceedings of the 38th International Conference on System Sciences, January 2005, 12828 downloads, 116 citations.
- L. Ding, T. Finin, and A. Joshi, "Analyzing Social Networks on the Semantic Web", Article, IEEE Intelligent Systems, January 2005, 8089 downloads, 20 citations.
- L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. C. Doshi, and J. Sachs, "Swoogle: A Search and Metadata Engine for the Semantic Web", InProceedings, Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management , November 2004, 27270 downloads, 467 citations.
- L. Ding, P. Kolari, S. Ganjugunte, T. XXXXX, and A. Joshi, "Modeling and Evaluating Trust Network Inference", InProceedings, Seventh International Workshop on Trust in Agent Societies at AAMAS 2004, July 2004, 4226 downloads, 31 citations.
- L. Ding, "Weaving the Web of Belief into the Semantic Web", Misc, submitted to WWW2004, May 2004, 6660 downloads.
- L. Ding, L. Zhou, and T. Finin, "Trust Based Knowledge Outsourcing for Semantic Web Agents", InProceedings, Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence, October 2003, 2999 downloads, 47 citations.
- P. Kolari and A. Java, "Spam Blogs: Ping Servers and Adversaries", TechReport, March 2007.
- A. Java, S. Nirenburg, M. McShane, J. English, and A. Joshi, "Using a Natural Language Understanding System to Generate Semantic Web Content", TechReport, October 2006, 2408 downloads.
- L. Ding, Y. Peng, P. Pinheiro da Silva, and D. L. McGuinness, "Tracking RDF Graph Provenance using RDF Molecules", TechReport, TR-CS-05-06, April 2005, 8253 downloads.
- (Project) Semantic Discovery: Discovering Complex Relationships in Semantic Web has related publication (Publication) Social Networking on the Semantic Web