NLP based class LM
A particular problem of LM is that some words exists only a few times, but the context of these words should not be computed as such. For example, numbers 12537. It may occur in the training text only once, however it is context ( the context that contains numbers) is pretty solid. This motivates the class LM.
In class LMs, words with the same context are grouped as a class and the context is estimated by replacing all the class words with this class. In the class, words might be random selected or selected with some probability. This idea is a bit similar as decision tree (by the way, can we introduce tree LM?)
A class should be (1) share the same context in linguistics (2) large enough, even infinite (open) so that token-based context estimation is incorrect.
There are at least two classes: number, and name entities. Numbers are relatively simple, while name entities are not trivial. An interesting research is applying NLP approaches to find out name entities first, and then group the name entities into one or a few classes, if that can be obtained, for example, address, name, city...
This also motivates another two ideas:
1. can we use more high level knowledge to improve ASR, such as parsing. Some people did that using shallow parse, but, we may want some stochastic way to integrate them. FST? CRF? re-scoring lattices?
2. can we use knowledge in real word, e.g., sematic web, to increase the ASR accuracy? Suppose we give just phone sequences of a name entity...