An Investigation of Linguistic Information for Speech Recognition


Monday, October 20, 2008, 11:00am - Monday, October 20, 2008, 13:00pm

325 ITE

After several decades of effort, speech recognition technologies have made significant progress and various speech based applications have been developed. However, current speech recognition systems still generate erroneous output, which hinders the wide adoption of speech applications. Given that speech recognition systems' goal of error-free output can not be realized in near future, mechanisms for automatically detecting and even correcting speech recognition errors are called on to amend imperfect speech recognition systems. This dissertation research focuses on the automatic detection of speech recognition errors for monologue applications, especially dictation application.

Due to the computational complexity and efficiency, limited linguistic information is embedded in speech recognition systems. In addition, when identifying speech recognition errors, humans always apply their linguistic knowledge to finish the task. This dissertation therefore investigates the effect of linguistic information on automatic error detection by applying two levels of linguistic analysis, specifically syntactic analysis and semantic analysis, to post process speech recognition output. Experiments were conducted on two dictation corpora which differ in both topic and style (daily office communication by students and Wall Street Journal news by journalists).

To catch the grammatical abnormalities possibly caused by speech recognition errors, two sets of syntactic features, linkage information and word associations based on syntactic dependency, are extracted for each word from the output of two lexicalized robust syntactic parsers respectively. After the syntactic features are combined with confidence score-related features for confidence measure using Support Vector Machine, they yield consistent performance improvement in one or more aspects over that obtained by using confidence score-related features alone.

The semantic abnormalities possibly caused by speech recognition errors are caught by the analysis of the semantic relatedness of a word to its context. Two different methods are used to integrate the semantic analysis with syntactic analysis. One approach addresses the problem by extracting features for each word from its relations to other words. To this end, various WordNet-based measures and different context lengths are examined. The addition of the semantic features can further yield small but consistent improvement in error detection performance. The other approach expands the lexical cohesion analysis by taking both reiteration and collocation relationships into consideration and by augmenting words with prediction probability from syntactic analysis. Two WordNet-based measures and the measure based on Latent Semantic Analysis are used to instantiate lexical cohesion relationships. Additionally, various word probability thresholds and cosine similarity thresholds are examined. The incorporation of lexical cohesion analysis is superior to using syntactic analysis alone.

In summary, linguistic information, including syntactic and semantic information, can provide positive impact on automatic detection of speech recognition errors.

OWL Tweet

UMBC ebiquity