Knowledge discovery in networks
by David Jensen
Monday, November 14, 2005, 15:00pm
325b ITE
Networks are an increasing common method of representing the
relationships among sets of interacting entities. This basic
data structure is reflected in how we analyze and understand
social networks, networks of scholarly citations and web
pages, and networks of computers and communications
devices. Over the past five years, my students and I have
developed a number of methods for learning statistical models
of networks that can make accurate predictions about the
attributes of nodes in the network and also provide insight
into the broad structure of statistical dependencies among
different types of nodes. These models build on methods
developed previously in statistics, machine learning, and
knowledge discovery, including Bayesian networks and
probability estimation trees.
We have applied these techniques to a wide variety of problems, including citation analysis and fraud detection. Most recently, we have applied these techniques to detect fraud among stock brokers, in a joint project with the National Association of Securities Dealers. We have also developed an open-source software environment incorporating our tools for statistical modeling and ad hoc querying of relational data.