Semantic Interpretation of Structured Log Files
August 1, 2015
Log files comprise a record of different events happening in various applications, operating systems and even in network devices. Originally they were used to record in- formation for diagnostic and debugging purposes. Nowadays, logs are also used to track events which can be used in auditing and forensics in case of malicious activities or sys- tems attacks. Various softwares like intrusion detection systems, webservers, anti-virus and anti-malware systems, firewalls and network devices generate logs with useful information, that can be used to protect against such system attacks. Analyzing log files can help in pro- actively avoiding attacks against the systems. While there are existing tools that do a good job when the format of log files is known, the challenge lies in cases where log files are from unknown devices and of unknown formats. We propose a framework that takes any log file and automatically gives out a seman- tic interpretation as a set of RDF Linked Data triples. The framework splits a log file into columns using regular expression-based or dictionary-based classifiers. Leveraging and modifying our existing work on inferring the semantics of tables, we identify every col- umn from a log file and map it to concepts either from a general purpose KB like DBpedia or domain specific ontologies such as IDS. We also identify relationships between vari- ous columns in such log files. Converting large and verbose log files into such semantic representations will help in better search, integration and rich reasoning over the data.
MastersThesis
University of Maryland, Baltimore County
Downloads: 1090 downloads