DNN training

Environment setting

Model	CE	MPE1	MPE2	MPE3	MPE4
4k states	23.27/22.85	21.35/18.87	21.18/18.76	21.07/18.54	20.93/18.32
8k states	22.16/22.22	20.55/18.03	20.36/17.94	20.32/17.78	20.29/17.80
8k states + IT	-	20.04/17.38	20.01/17.32	20.07/17.44	19.94/17.65

Code ready for direct adaptation, insertion adaptation and KL-regularized adaptatoin
50 sentences for adaptation, 834 sentences for testing
WER from 14.56 to 11.13
Hidden layer adaptation is better than input and output adaptation
Before Linear adaptation is better than after-linear adaptation
Results are here

CLG decoder uses less memory in decoding
HCLG is faster and more accurate than CLG, and more amiable to beam control here

Faster decoder
std::exp/std::log result in very slow computation in train203. Solved the problem by replacing to standard exp() and log().
The RT of the latest decoder on train203 is 0.25