The empirical use of continuous variables in Bayesian belief networks for exploratory data analysis
May 1, 1998
A general method, which does not require the previous selection of model parameters, is needed for exploratory data analysis of large medical databases. Bayesian Belief Networks (BBNs), from knowledge discovery techniques, can be used to model the conditional dependencies which may exist among study variables. BBNs graphically express these dependencies as directed arcs from one node, which represents a variable, to another node.
Most algorithms for BBN construction require all variables to have discrete values. Since much of the data from medical studies has continuous values, e.g., weight, reliable methods are needed to convert these variables into discrete ones. This is done by partitioning the range of continuous values into a relatively small number of intervals, with a discrete value representing each interval.
A new procedure, based upon a continuous variable's information content, is presented here, which gives an optimal partitioning of the range of values. It is optimal in that both the information loss from the conversion and the number of intervals are minimized. Another new procedure dynamically repartitions the values of continuous variables during BBN construction, according to a Minimum Descriptive Length (MDL) measure.
These conversion procedures are used in conjunction with regression models, which are derived from continuous variables selected automatically by initial BBN models. The results of the regression models are used in subsequent BBNs to enhance the accuracy of the BBN model.
All of these procedures were tested on data from two medical studies. The resulting models, which are strongly influenced by the partitioning of the continuous variables, depict some of the original study findings, as well as some unexpected relationships between variables.
PhdThesis
University of Maryland, Baltimore County
Computer Science and Electrical Engineering
Downloads: 558 downloads