The HMM-LR method is known as the speech recognition method by combining HMM and GLR parsing. The HMM-LR has the following advantages. First, accurate recognition is achieved by predicting plausible hypotheses using lookaheads of GLR parsing algorithm. Second, a HMM-LR system outputs not only word sequences but also parse trees as recognition results.
In this research, we propose a new language model named PGLR+ for HMM-LR systems. PGLR+ is the PGLR language model extended by GLR parsing algorithm with multi-level connection constraints (GLR+). The PGLR model is known as the most sophisticated GLR-based probabilistic model from a theoretical and empirical point of view. GLR+ enables to incorporate not only allophone-level connection constraints but also morphological-category-level ones into an LR table simultaneously. PGLR+ has an advantage that it is more precise language model than N-gram language models because of syntactic and multi-level connection constraints.
There are two problems in constructing the PGLR+ language model. The first is that the traditional GLR parsing algorithm cannot treat multi-level connection constraints because a lookahead is a terminal symbol. We incorporate shift-reduce parsing algorithm for context-sensitive grammar into the GLR parsing algorithm. This modification enables to treat an arbitrary symbol as a lookahead and check connection constraints between two nonterminal symbols. The second is how to assign probabilities to a modified LR table for GLR+. We showed that a GLR+ parser could decide a lookahead which belongs to nonterminal symbols by the previously executed action. We also showed that utilization of this knowledge could assign probabilities to an LR table of GLR+.
Experiments using Japanese dialog corpus showed that PGLR+ was more effective than the trigram model in terms of test-set perplexity, which is the average number of hypotheses at a decision point. Finally, we compared our experiment with past researches using the same corpus. Although our experiments used the larger number of test-set sentences and lexical rules than past researches, PGLR+ obtained the best results.