<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://index.cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://index.cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Sinovoice-2014-02-17</id>
		<title>Sinovoice-2014-02-17 - 版本历史</title>
		<link rel="self" type="application/atom+xml" href="http://index.cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Sinovoice-2014-02-17"/>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-02-17&amp;action=history"/>
		<updated>2026-04-15T03:50:12Z</updated>
		<subtitle>本wiki的该页面的版本历史</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-02-17&amp;diff=9180&amp;oldid=prev</id>
		<title>2014年2月17日 (一) 06:59 Cslt</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-02-17&amp;diff=9180&amp;oldid=prev"/>
				<updated>2014-02-17T06:59:41Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;tr style='vertical-align: top;'&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;←上一版本&lt;/td&gt;
				&lt;td colspan='2' style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;2014年2月17日 (一) 06:59的版本&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;第66行：&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;第66行：&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Cross entropy regularization with P=0.3 is reasonable&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Cross entropy regularization with P=0.3 is reasonable&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&amp;amp;cvssid=158 here]&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&amp;amp;cvssid=158 here]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;==Auto Transcription==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* PICC development set decoding obtained 45% WER.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* PICC training set decoding done (200h), confidence generated&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Set threshold=0.9, reduce the training data from 230k sentences to 40k. &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Do discriminative training with the filtered 40k sentences and test on the development set&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=DNN Decoder=&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=DNN Decoder=&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Cslt</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-02-17&amp;diff=9179&amp;oldid=prev</id>
		<title>Cslt：以内容“=DNN training=  ==Environment setting==  * 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough.   * The new machine has been added into the SGE env.  ==Corpora== * B...”创建新页面</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php?title=Sinovoice-2014-02-17&amp;diff=9179&amp;oldid=prev"/>
				<updated>2014-02-17T06:51:09Z</updated>
		
		<summary type="html">&lt;p&gt;以内容“=DNN training=  ==Environment setting==  * 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough.   * The new machine has been added into the SGE env.  ==Corpora== * B...”创建新页面&lt;/p&gt;
&lt;p&gt;&lt;b&gt;新页面&lt;/b&gt;&lt;/p&gt;&lt;div&gt;=DNN training=&lt;br /&gt;
&lt;br /&gt;
==Environment setting==&lt;br /&gt;
&lt;br /&gt;
* 2nd GPU machine is ready. 3T * 4 RAID-0 is fast enough.  &lt;br /&gt;
* The new machine has been added into the SGE env.&lt;br /&gt;
&lt;br /&gt;
==Corpora==&lt;br /&gt;
* Beijing mobile 120h speech data are ready.&lt;br /&gt;
* PICC data are under labeling (200h), ready in two weeks.&lt;br /&gt;
* Now totally 1100h telephone speech will be ready soon.&lt;br /&gt;
&lt;br /&gt;
==470 hour 8k training==&lt;br /&gt;
&lt;br /&gt;
* 470 + 300h + Beijing mobile 120h training&lt;br /&gt;
:* Re-train the whole models including gmm+dnn, with noise model involved.&lt;br /&gt;
:* Train noise model by treating noise as a special phone&lt;br /&gt;
:* The noise should be treated specifically in L construction&lt;br /&gt;
:* 7.2h per iteration, the xEnt training might be finished in 1 week&lt;br /&gt;
:* Run incremental DT training on the CSLT cluster, by mapping noise to the silence phone.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==6000 hour 16k training==&lt;br /&gt;
&lt;br /&gt;
* Ran CE DNN to iteration 8 (8400 states, 80000 pdf)&lt;br /&gt;
* Testing results go down to 12.69% WER (Sinovoice results: 10.70).&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Model !! WER !! RT &lt;br /&gt;
|-&lt;br /&gt;
|small LM, it 4,  -5/-9  ||15.80 || 1.18&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 4, -5/-9   || 15.30 || 1.50&lt;br /&gt;
|-&lt;br /&gt;
|large LM,  it 4, -6/-9   || 15.36 || 1.30&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 4, -7/-9    || 15.25 || 1.30&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 5, -5/-9    || 14.17 || 1.10&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 5,  -5/-10 || 13.77 || 1.29&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 6, -5/-9    || 13.64 || -&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 6,  -5/-10 || 13.25 || -&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 7, -5/-9    || 13.29 || -&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 7,  -5/-10 || 12.87 || -&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 8, -5/-9    || 13.09 || -&lt;br /&gt;
|-&lt;br /&gt;
|large LM, it 8,  -5/-10 || 12.69 || -&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
* A new round of training with shared trees for tone variations has been kicked off and run into dnn training again.&lt;br /&gt;
* Need to test the new gmm model, need to compare to Xiaoming's original settings&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Adaptation==&lt;br /&gt;
&lt;br /&gt;
* Adaptation with 10, 20, 30 sentences are conducted&lt;br /&gt;
* 30 sentences can reach reasonable performance (from 14.6 to 11.2).&lt;br /&gt;
* Hidden layer adaptation is better than input and output adaptation&lt;br /&gt;
* Cross entropy regularization with P=0.3 is reasonable&lt;br /&gt;
* Results are [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&amp;amp;cvssid=158 here]&lt;br /&gt;
&lt;br /&gt;
=DNN Decoder=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Faster decoder&lt;br /&gt;
:* The new RT is reported [http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?step=view_request&amp;amp;cvssid=160 here]&lt;br /&gt;
:* The RT of the latest decoder on train203 is 0.144 (HCLG) 0.148 (CLG).&lt;br /&gt;
&lt;br /&gt;
* Online decoder&lt;br /&gt;
:* Interface design completed&lt;br /&gt;
:* CMN strategy is clear: (1) global CMN model be trained first (2) Apply the model in decoding directly (3) may need to adapt the DNN model slightly with the feature.&lt;/div&gt;</summary>
		<author><name>Cslt</name></author>	</entry>

	</feed>