Intrusion Detection: Modeling System State to Detect and Classify Aberrant Behavior
February 17, 2004
We present a dual-phase host-based intrusion detection process. We have demonstrated, through experimental validation, that our process improves the current state of intrusion detection capabilities. The first phase uses cluster analysis to compare samples of low-level operating system data to an established model of normalcy. The second phase takes instances of non-conforming data from phase-1, maps that data to instances of our target-centric ontology and reasons over it. The reasoning process serves two purposes: primarily it is intended to classify the anomalous data as a specific type, or class, of attack. Its secondary purpose is to provide an orthogonal test to differentiate between true and false positives. We developed a novel metric (self-distance) to quantify the streams of system calls generated by a process and we have constructed a feature set from the low-level operating system data, which is subsequently used as input to the clustering process. We experimented with different clustering algorithms (Fuzzy c-Medoid, k-Means, and Principal Direction Divisive Partitioning), distance measures (Euclidean and Mahalanobis), and the effects of znormalizing the data set. Our experiments indicated that the Fuzzy c-Mediod algorithm using the Mahalanobis metric as a distance measure was the optimal performer, yielding an F-Measure of .9822. The F-Measure is a common method for describing accuracy and is combination of precision and recall.
We experimentally demonstrated the case for migrating from taxonomic classification systems and their syntactical representation languages to ontologies and semantically rich ontology specification languages. We created a data model of the relationships that hold between the low-level data and instances of attacks and intrusions. We used the DARPA Agent Markup Language + Ontology Inference Layer to specify the data model as a ontology and the Java Theorem Prover, a sound and complete First Order Logic theorem prover, to reason over and classify instances data that were deemed to be anomalous in the first phase of our process. Our classification mechanism achieved an F-Measure of .9776. The overall F-Measure of our dual-phase process was .9718.
Ignoring the characteristics of the data population is a classic mistake that is made when evaluating intrusion systems. This is also referred to as the base-rate fallacy. When evaluating the posterior probability (the probability of an alarm given an intrusion) of our process, we achieved a score of .998.
We also present two novel mechanisms to detect and mitigate aberrant behaviors encountered in Mobile Ad Hoc and Wireless Sensor networks. Both of these networks consist of resource constrained devices. Accordingly, we present our intrusion detection mechanisms as protocols that monitor network state rather than system state.PhdThesis
University of Maryland, Baltimore County
Department of Computer Science and Electrical Engineering
Downloads: 7413 downloads