Semantic Interpretation of Structured Log Files

Log files comprise a record of different events happening in various applications, operating systems and even in network devices. Originally they were used to record in- formation for diagnostic and debugging purposes. Nowadays, logs are also used to track events which can be used in auditing and forensics in case of malicious activities or sys- tems attacks. Various softwares like intrusion detection systems, webservers, anti-virus and anti-malware systems, firewalls and network devices generate logs with useful information, that can be used to protect against such system attacks. Analyzing log files can help in pro- actively avoiding attacks against the systems. While there are existing tools that do a good job when the format of log files is known, the challenge lies in cases where log files are from unknown devices and of unknown formats. We propose a framework that takes any log file and automatically gives out a seman- tic interpretation as a set of RDF Linked Data triples. The framework splits a log file into columns using regular expression-based or dictionary-based classifiers. Leveraging and modifying our existing work on inferring the semantics of tables, we identify every col- umn from a log file and map it to concepts either from a general purpose KB like DBpedia or domain specific ontologies such as IDS. We also identify relationships between vari- ous columns in such log files. Converting large and verbose log files into such semantic representations will help in better search, integration and rich reasoning over the data.


  • 224202 bytes

cybrsecurity, lod, logs, owl, rdf, semantic weg

MastersThesis

University of Maryland, Baltimore County

Downloads: 1000 downloads

UMBC ebiquity