| UMBC ebiquity |
An Investigation of Linguistic Information for Speech RecognitionTweetSpeaker: Yongmei Shi Start: Monday, October 20, 2008, 11:00AM End: Monday, October 20, 2008, 01:00PM Location: 325 ITE Abstract: After several decades of effort, speech recognition technologies have made
significant progress and various speech based applications have been
developed. However, current speech recognition systems still generate
erroneous output, which hinders the wide adoption of speech applications.
Given that speech recognition systems' goal of error-free output can not
be realized in near future, mechanisms for automatically detecting and
even correcting speech recognition errors are called on to amend imperfect
speech recognition systems. This dissertation research focuses on the
automatic detection of speech recognition errors for monologue
applications, especially dictation application.
Due to the computational complexity and efficiency, limited linguistic
information is embedded in speech recognition systems. In addition, when
identifying speech recognition errors, humans always apply their
linguistic knowledge to finish the task. This dissertation therefore
investigates the effect of linguistic information on automatic error
detection by applying two levels of linguistic analysis, specifically
syntactic analysis and semantic analysis, to post process speech
recognition output. Experiments were conducted on two dictation corpora
which differ in both topic and style (daily office communication by
students and Wall Street Journal news by journalists).
To catch the grammatical abnormalities possibly caused by speech
recognition errors, two sets of syntactic features, linkage information
and word associations based on syntactic dependency, are extracted for
each word from the output of two lexicalized robust syntactic parsers
respectively. After the syntactic features are combined with confidence
score-related features for confidence measure using Support Vector
Machine, they yield consistent performance improvement in one or more
aspects over that obtained by using confidence score-related features
alone.
The semantic abnormalities possibly caused by speech recognition errors
are caught by the analysis of the semantic relatedness of a word to its
context. Two different methods are used to integrate the semantic
analysis with syntactic analysis. One approach addresses the problem by
extracting features for each word from its relations to other words. To
this end, various WordNet-based measures and different context lengths are
examined. The addition of the semantic features can further yield small
but consistent improvement in error detection performance. The other
approach expands the lexical cohesion analysis by taking both reiteration
and collocation relationships into consideration and by augmenting words
with prediction probability from syntactic analysis. Two WordNet-based
measures and the measure based on Latent Semantic Analysis are used to
instantiate lexical cohesion relationships. Additionally, various word
probability thresholds and cosine similarity thresholds are examined. The
incorporation of lexical cohesion analysis is superior to using syntactic
analysis alone.
In summary, linguistic information, including syntactic and semantic
information, can provide positive impact on automatic detection of speech
recognition errors. |