A Semantically Rich Cognitive Search Assistant For Clinical Notes
April 29, 2017
There are many use cases in the medical industry and in research that require clinical information extraction from the narrative notes in electronic medical records. Significant advances have been made in recent years from using clinical text processing systems which rely heavily on the natural language processing pipeline of sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing. This approach relies on the text conforming to the rules of grammar in order for the underlying algorithms to perform well. However, for text that is entered by the clinician at the point of care, where time efficiency is paramount, a shorthand style of text is used which is heavily abbreviated and tends to ignore the rules of grammar, punctuation, and white space. In a corpus of 1,200 notes coming from the US Veterans Administration, grammatically clean text constitutes only of 5% of the total text leaving 95% of the text not amenable to the approaches proposed in extant literature. This is especially significant as the Veterans Administration the largest healthcare provider in the US. This research describes an approach that is robust to grammatically deficient text by not relying on grammatical structure but on the phrasal patterns that are prevalent in the medical domain. It relies on techniques that are able to incorporate micro-contexts by taking into account scope, proximity, and location of multiple interdependent matched patterns in order to extract the relevant attributes of medical concepts. Patterns that rely on dictionaries of terms and the results of other extraction algorithms are also accommodated. In addition, in order for the medical concept extraction to be useful in real clinical decision support systems, the extraction has been optimized for run-time efficiency for near real-time performance. This validity of this approach was established by employing it to create a semantically rich cognitive search assistant that runs in near real-time over the corpus of clinical notes from the Veterans Administration. The system is able to extract medical concepts that are signs and symptoms along with their contextual attributes including location, severity, onset, duration, certainty. In particular, pain was used as the initial use case because of its prevalence within clinical notes. It is also the most difficult of all the symptoms because of the wide variety of ways in which pain can be expressed. The cognitive search assistant was able to extract semantically structured representations of occurrences of pain events in the text with a positive precision of 87%, a positive recall of 93% at a rate of 0.31 seconds per note. The semantic representation of the results also permits a reasoning system to be incorporated to perform cognitively rich searches when used in conjunction with predefined medical ontologies. This allows, for instance, a search for arm pain to include results that involve elbows, wrists, hands, or fingers using the part-of relation defined in the ontology. The result is a semantically rich cognitive search assistant capable of near real-time structured search over clinical text that can be used in interactive applications such as clinical decision support.
PhdThesis
UMBC
Downloads: 1196 downloads