<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://index.cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://index.cslt.org/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Yuanb</id>
		<title>cslt Wiki - 用户贡献 [zh-cn]</title>
		<link rel="self" type="application/atom+xml" href="http://index.cslt.org/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Yuanb"/>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E7%89%B9%E6%AE%8A:%E7%94%A8%E6%88%B7%E8%B4%A1%E7%8C%AE/Yuanb"/>
		<updated>2026-04-07T01:41:32Z</updated>
		<subtitle>用户贡献</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2015-spring-time-table</id>
		<title>2015-spring-time-table</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2015-spring-time-table"/>
				<updated>2015-02-05T06:40:23Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;2015年春节时间表&lt;br /&gt;
&lt;br /&gt;
{|class='wikitable'&lt;br /&gt;
!人员 !! 春节往返时间统计  !! 天数&lt;br /&gt;
|-&lt;br /&gt;
|张之勇  || 2月13号-2月26号  || 14&lt;br /&gt;
|-&lt;br /&gt;
|赵梦原  || 2月14号-2月27号  || 14&lt;br /&gt;
|-&lt;br /&gt;
|张雪薇   || 2月18号-2月28号  || 11&lt;br /&gt;
|-&lt;br /&gt;
|王晓曦   || 2月15号-2月 28号 || 14&lt;br /&gt;
|-&lt;br /&gt;
|刘荣     || 2月14号-2月26号  || 13&lt;br /&gt;
|-&lt;br /&gt;
|骆天一   || 2月12号-2月25号  || 14&lt;br /&gt;
|-&lt;br /&gt;
|刘超     || 2月6号-2月11号，春节未定  || ?&lt;br /&gt;
|-&lt;br /&gt;
|殷实     || 2月12号-3月1号  || 18&lt;br /&gt;
|-&lt;br /&gt;
|林一叶   || 2月14号-未定    || ?&lt;br /&gt;
|-&lt;br /&gt;
|王冕     || 2月10号-2月28号 || 19&lt;br /&gt;
|-&lt;br /&gt;
|邢超     || 2月9号-2月27号  || 19&lt;br /&gt;
|-&lt;br /&gt;
|袁彬     || 2月10号-2月28号    || 19&lt;br /&gt;
|-&lt;br /&gt;
|张东旭   ||2月10号-2月28号  || 19&lt;br /&gt;
|-&lt;br /&gt;
|马习     || 2月12号-2月26号  || 15&lt;br /&gt;
|- &lt;br /&gt;
|曹立     || -3月3号          || O&lt;br /&gt;
|-&lt;br /&gt;
|曾翔宇   || 2月11号-2月28    || 18&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
注：标准放假时间：工程师:14天假期, 学生: 17天假期&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-02-03</id>
		<title>Bin Yuan 2015-02-03</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-02-03"/>
				<updated>2015-02-03T03:41:36Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“last week done:     1. construct the large-scale training data for knowledge vector     2. prepare to submit the MTAP paper     3. finish the tag lm toolkit  plan fo...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;last week done:&lt;br /&gt;
    1. construct the large-scale training data for knowledge vector&lt;br /&gt;
    2. prepare to submit the MTAP paper&lt;br /&gt;
    3. finish the tag lm toolkit&lt;br /&gt;
&lt;br /&gt;
plan for this week:&lt;br /&gt;
    1. conduct paragraph vector experiment&lt;br /&gt;
    2. review papers for knowledge vector&lt;br /&gt;
    3. start knowledge vector draft&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2015-02-02</id>
		<title>2015-02-02</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2015-02-02"/>
				<updated>2015-02-03T03:35:13Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Mengyuan Zhao 2015-02-02]]&lt;br /&gt;
&lt;br /&gt;
[[Dongxu Zhang 2014-02-02]]&lt;br /&gt;
&lt;br /&gt;
[[Chao Liu 2014-02-02]]&lt;br /&gt;
&lt;br /&gt;
[[Miao Fan 2014-02-02]]&lt;br /&gt;
&lt;br /&gt;
[[Fanhu Bie 2014-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Lantian Li 2015-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Yiye Lin 2015-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Tianyi Luo 2015-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Xiangyu Zeng 2015-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Xuewei Zhang 2015-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Xiaoxi Wang 2015-02-03]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 2015-02-03]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-01-26</id>
		<title>Bin Yuan 2015-01-26</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-01-26"/>
				<updated>2015-01-26T07:20:23Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * try the hinge loss function for tree-based knowledge vector learning, and improve the correlation score from 0.5 to 0.8 * try the si...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* try the hinge loss function for tree-based knowledge vector learning, and improve the correlation score from 0.5 to 0.8&lt;br /&gt;
* try the sigmoid of inner product similarity function, result is comparable with the inner product function&lt;br /&gt;
* finish the taglm paper&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* make a larger test set&lt;br /&gt;
* use gensim toolkit to implement paragraph vector as a baseline&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2015-01-26</id>
		<title>2015-01-26</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2015-01-26"/>
				<updated>2015-01-26T07:06:38Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Lantian Li 2015-01-26]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 2015-01-26]]&lt;br /&gt;
&lt;br /&gt;
[[Zhongda Xie 2015-01-26]]&lt;br /&gt;
&lt;br /&gt;
[[Miao Fan 2015-01-26]]&lt;br /&gt;
&lt;br /&gt;
[[Xiaoxi Wang 2015-01-26]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 2015-01-26]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-01-19</id>
		<title>Bin Yuan 2015-01-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-01-19"/>
				<updated>2015-01-19T02:21:57Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * replace wikipedia's taxonomy with Yago's taxonomy. === Plan for next week === * alter the objective function.”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* replace wikipedia's taxonomy with Yago's taxonomy.&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* alter the objective function.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2015-01-19</id>
		<title>2015-01-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2015-01-19"/>
				<updated>2015-01-19T02:15:08Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Miao Fan 2015-01-17]]&lt;br /&gt;
&lt;br /&gt;
[[2015-01-19 Rong Liu]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 2015-01-19]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-01-12</id>
		<title>Bin Yuan 2015-01-12</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2015-01-12"/>
				<updated>2015-01-12T06:20:18Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * implement multi-thread training for large wiki graph === Plan for next week === * continue knowledge vector.”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* implement multi-thread training for large wiki graph&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* continue knowledge vector.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2015-01-12</id>
		<title>2015-01-12</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2015-01-12"/>
				<updated>2015-01-12T06:15:26Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Miao Fan 2015-01-12]]&lt;br /&gt;
&lt;br /&gt;
[[Xiaoxi Wang 2015-01-12]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 2015-01-12]]&lt;br /&gt;
&lt;br /&gt;
[[Mengyuan Zhao 2015-01-12]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 2015-01-12]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bi-monthly-2015-01</id>
		<title>Bi-monthly-2015-01</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bi-monthly-2015-01"/>
				<updated>2015-01-05T06:04:59Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[bi-monthly-result-2015-01|Result]]&lt;br /&gt;
&lt;br /&gt;
[[媒体文件:Bi-Monthly report.pdf|Dong Wang]]&lt;br /&gt;
&lt;br /&gt;
[[媒体文件:季度总结-刘荣.pdf|Rong Liu]]&lt;br /&gt;
&lt;br /&gt;
[[媒体文件:个人双月总结.pdf|Shi Yin]]&lt;br /&gt;
&lt;br /&gt;
[[媒体文件:张东旭10-11月总结.pdf|Dongxu Zhang]]&lt;br /&gt;
&lt;br /&gt;
[[媒体文件:曾翔宇双月总结.pdf|Xiangyu Zeng]]&lt;br /&gt;
&lt;br /&gt;
[[媒体文件:Bi-month report yuanbin pdf.pdf|Bin Yuan]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Bi-month_report_yuanbin_pdf.pdf</id>
		<title>文件:Bi-month report yuanbin pdf.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Bi-month_report_yuanbin_pdf.pdf"/>
				<updated>2015-01-05T06:02:41Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2014-12-28</id>
		<title>Bin Yuan 2014-12-28</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_2014-12-28"/>
				<updated>2014-12-29T01:09:30Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * separate different types of link, and assign different weight for training. * use entry abstract words for training. === Plan for ne...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* separate different types of link, and assign different weight for training.&lt;br /&gt;
* use entry abstract words for training.&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* continue knowledge vector, try more reasonable objective function.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-28</id>
		<title>2014-12-28</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-28"/>
				<updated>2014-12-29T00:59:24Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Miao Fan 2014-12-28]]&lt;br /&gt;
&lt;br /&gt;
[[Mengyuan Zhao 2014-12-28]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 2014-12-28]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 2014-12-28]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-21</id>
		<title>Bin Yuan 14-12-21</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-21"/>
				<updated>2014-12-22T01:32:24Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * change tag lm merge method to add, and conduct a experiment about hclg merge, result see http://cslt.riit.tsinghua.edu.cn/cgi-bin/...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* change tag lm merge method to add, and conduct a experiment about hclg merge, result see [[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=yuanb&amp;amp;step=view_request&amp;amp;cvssid=304]]&lt;br /&gt;
* help zhiyong fix the merge bug.&lt;br /&gt;
* knowledge vector baseline done.&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* continue knowledge vector, use more information to train.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-21</id>
		<title>2014-12-21</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-21"/>
				<updated>2014-12-22T01:25:39Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[2014-12-19 Miao Fan]]&lt;br /&gt;
&lt;br /&gt;
[[Dongxu Zhang 14-12-21]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 14-12-21]]&lt;br /&gt;
&lt;br /&gt;
[[Xiaoxi Wang 14-12-21]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-12-21]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Taglm_draft.pdf</id>
		<title>文件:Taglm draft.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Taglm_draft.pdf"/>
				<updated>2014-12-22T01:24:07Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：Yuanb上传“文件:Taglm draft.pdf”的新版本&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;tag lm draft&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf</id>
		<title>文件:Tag lm report.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf"/>
				<updated>2014-12-15T01:18:34Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：Yuanb上传“文件:Tag lm report.pdf”的新版本&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-14</id>
		<title>Bin Yuan 14-12-14</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-14"/>
				<updated>2014-12-14T19:36:54Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：/* Accomplished this week */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* finish the tag lm technical report, see [[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/d/d6/Taglm_draft.pdf]].&lt;br /&gt;
* find a bug in our code for merging the grammar with lm, the reason is clear and to be fixed.&lt;br /&gt;
* knowledge vector baseline setup mainly done, need to find a task to evaluate the result.&lt;br /&gt;
&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* continue knowledge vector.&lt;br /&gt;
* fix the bug mentioned above.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-14</id>
		<title>Bin Yuan 14-12-14</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-14"/>
				<updated>2014-12-14T19:35:47Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * finish the tag lm technical report. * find a bug in our code for merging the grammar with lm, the reason is clear and to be fixed. *...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* finish the tag lm technical report.&lt;br /&gt;
* find a bug in our code for merging the grammar with lm, the reason is clear and to be fixed.&lt;br /&gt;
* knowledge vector baseline setup mainly done, need to find a task to evaluate the result.&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* continue knowledge vector.&lt;br /&gt;
* fix the bug mentioned above.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-14</id>
		<title>2014-12-14</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-14"/>
				<updated>2014-12-14T19:26:01Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Fanhu bie 14-12-14]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 14-12-14]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-12-14]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Taglm_draft.pdf</id>
		<title>文件:Taglm draft.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Taglm_draft.pdf"/>
				<updated>2014-12-14T19:24:09Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：tag lm draft&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;tag lm draft&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf</id>
		<title>文件:Tag lm report.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf"/>
				<updated>2014-12-14T19:21:12Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：Yuanb上传“文件:Tag lm report.pdf”的新版本&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf</id>
		<title>文件:Tag lm report.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf"/>
				<updated>2014-12-14T19:19:35Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：Yuanb上传“文件:Tag lm report.pdf”的新版本&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf</id>
		<title>文件:Tag lm report.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf"/>
				<updated>2014-12-08T00:41:19Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：Yuanb上传“文件:Tag lm report.pdf”的新版本&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-07</id>
		<title>Bin Yuan 14-12-07</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-07"/>
				<updated>2014-12-08T00:37:06Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * knowledge vector build graph done, extract one specific domain entries(such as Category:Animals), now we can extract a subgraph give...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* knowledge vector build graph done, extract one specific domain entries(such as Category:Animals), now we can extract a subgraph given a root entry and can find out all the paths from root entry to a given leaf entry.&lt;br /&gt;
* update tag language model technical report, see [[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f5/Tag_lm_report.pdf]]&lt;br /&gt;
* read one paper about merging both FSG and N-gram into a single decoding graph, see [[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/3/3c/CustomizedASR_ICSR2012_CameraReady.pdf]]&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* integrate the experiment part with Xiaoxi's part.&lt;br /&gt;
* finish knowledge vector baseline set up.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:CustomizedASR_ICSR2012_CameraReady.pdf</id>
		<title>文件:CustomizedASR ICSR2012 CameraReady.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:CustomizedASR_ICSR2012_CameraReady.pdf"/>
				<updated>2014-12-08T00:36:27Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：tag based language model using wfst&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;tag based language model using wfst&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-07</id>
		<title>2014-12-07</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-07"/>
				<updated>2014-12-08T00:22:59Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Xiaoxi Wang 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Dongxu Zhang 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Xiangyu Zeng 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Miao Fan 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-12-07]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf</id>
		<title>文件:Tag lm report.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf"/>
				<updated>2014-12-01T01:31:34Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：Yuanb上传“文件:Tag lm report.pdf”的新版本&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-01</id>
		<title>Bin Yuan 14-12-01</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-01"/>
				<updated>2014-12-01T01:21:43Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：/* Accomplished this week */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* write tag language model technical report,almost done, result see [[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f5/Tag_lm_report.pdf]]&lt;br /&gt;
* knowledge vector tree building algorithm done.&lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* write the experiment part of tag language model draft.&lt;br /&gt;
* set up knowledge vector baseline.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf</id>
		<title>文件:Tag lm report.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Tag_lm_report.pdf"/>
				<updated>2014-12-01T01:20:55Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-01</id>
		<title>Bin Yuan 14-12-01</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-12-01"/>
				<updated>2014-12-01T00:49:40Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * write tag language model technical report,almost done, result see [[]] * knowledge vector tree building algorithm done. === Planned...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* write tag language model technical report,almost done, result see [[]]&lt;br /&gt;
* knowledge vector tree building algorithm done.&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* write the experiment part of tag language model draft.&lt;br /&gt;
* set up knowledge vector baseline.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-01</id>
		<title>2014-12-01</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-01"/>
				<updated>2014-12-01T00:45:05Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Miao Fan 14-12-01]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-12-01]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-11-24</id>
		<title>Bin Yuan 14-11-24</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Bin_Yuan_14-11-24"/>
				<updated>2014-11-23T14:56:47Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * do some experiments and find the relationship between optimal merge weight and jsgf address number. result see http://cslt.riit.ts...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* do some experiments and find the relationship between optimal merge weight and jsgf address number. result see [[http://cslt.riit.tsinghua.edu.cn/cgi-bin/cvss/cvss_request.pl?account=yuanb&amp;amp;step=view_request&amp;amp;cvssid=304]]&lt;br /&gt;
* read some papers about wiki related information extraction and make a report.&lt;br /&gt;
* tell Zhenlong Han how to do tag lm stuff.&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* make a summary about tag-lm.&lt;br /&gt;
* read some paper about knowledge vector.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-23T14:51:20Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * build a new jsgf file * construct a test set for address tag language model * conduct a new experiment, result is as below  === Planned for n...”替换内容&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result is as below&lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* check the relation that between weight and size of dict.&lt;br /&gt;
* the short term should be punished.&lt;br /&gt;
* make a summary about tag-lm.&lt;br /&gt;
* read some paper about knowledge vector.&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-11-24</id>
		<title>2014-11-24</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-11-24"/>
				<updated>2014-11-23T14:49:10Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Miao Fan 14-11-24]]&lt;br /&gt;
&lt;br /&gt;
[[Jun Wang 14-11-24]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 14-11-24]]&lt;br /&gt;
&lt;br /&gt;
[[Fanhu Bie 14-11-23]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-11-24]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Text-2014-11-19</id>
		<title>Text-2014-11-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Text-2014-11-19"/>
				<updated>2014-11-20T04:03:30Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Abstract==&lt;br /&gt;
==resource==&lt;br /&gt;
*ppt&lt;br /&gt;
:*about knowledge vector disccussion[[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/c7/Knowledge_vector.ppt]]&lt;br /&gt;
:*about infomation extraction using wikipedia[[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/8c/Open_Information_Extraction_using_Wikipedia.pptx]]&lt;br /&gt;
*related paper&lt;br /&gt;
:*Open Infomation Extraction using Wikipedia[[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/6e/Open_information_extraction_using_Wikipedia.pdf]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Text-2014-11-19</id>
		<title>Text-2014-11-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Text-2014-11-19"/>
				<updated>2014-11-20T03:59:55Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“*ppt1http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/c7/Knowledge_vector.ppt *ppt2http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/8c/Open_Information_E...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;*ppt1[[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/c7/Knowledge_vector.ppt]]&lt;br /&gt;
*ppt2[[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/8/8c/Open_Information_Extraction_using_Wikipedia.pptx]]&lt;br /&gt;
*paper[[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/6e/Open_information_extraction_using_Wikipedia.pdf]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Open_information_extraction_using_Wikipedia.pdf</id>
		<title>文件:Open information extraction using Wikipedia.pdf</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Open_information_extraction_using_Wikipedia.pdf"/>
				<updated>2014-11-20T03:57:58Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Knowledge_vector.ppt</id>
		<title>文件:Knowledge vector.ppt</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Knowledge_vector.ppt"/>
				<updated>2014-11-20T03:53:08Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：knowledge vector disscussion&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;knowledge vector disscussion&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Open_Information_Extraction_using_Wikipedia.pptx</id>
		<title>文件:Open Information Extraction using Wikipedia.pptx</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E6%96%87%E4%BB%B6:Open_Information_Extraction_using_Wikipedia.pptx"/>
				<updated>2014-11-20T03:50:04Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：ppt for open IE using wikipedia&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;ppt for open IE using wikipedia&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Meeting_minutes</id>
		<title>Meeting minutes</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Meeting_minutes"/>
				<updated>2014-11-20T02:42:26Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[rules of report]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-08-19|2014-08-19]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-08-21|2014-08-21]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-08-22|2014-08-22]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-08-28|2014-08-28]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-09-25|2014-09-25]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-10-09|2014-10-09]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-10-18|2014-10-18]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-10-22|2014-10-22]]&lt;br /&gt;
&lt;br /&gt;
[[text-2014-11-19|2014-11-19]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-18T05:16:58Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：/* Result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result is as below&lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* check the relation that between weight and size of dict.&lt;br /&gt;
* the short term should be punished.&lt;br /&gt;
* make a summary about tag-lm.&lt;br /&gt;
* read some paper about knowledge vector.&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      北京: 6 / 10 (BJYD test set's text contains 10 &amp;quot;北京&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;北京&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;北京&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        北京: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        北京: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;北京&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、北京市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
        训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
        jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
        jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
      120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
    %ADD_ER[ 6 / 30, 16 / 40, 32 / 50 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;br /&gt;
      %ADD_ER[ 6 / 30, 8 / 40, 12 / 50 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-18T05:11:59Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：/* Result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result is as below&lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* check the relation that between weight and size of dict.&lt;br /&gt;
* the short term should be punished.&lt;br /&gt;
* make a summary about tag-lm.&lt;br /&gt;
* read some paper about knowledge vector.&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      北京: 6 / 10 (BJYD test set's text contains 10 &amp;quot;北京&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;北京&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;北京&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        北京: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        北京: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;北京&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、北京市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
        训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
        jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
        jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
      120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
    %ADD_ER[ 6 / 30, 16 / 40, 32 / 50 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;br /&gt;
      %ADD_ER[ 6 / 30, 8 / 40, 12 / 50 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-17T09:53:01Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result is as below&lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
* check the relation that between weight and size of dict.&lt;br /&gt;
* the short term should be punished.&lt;br /&gt;
* make a summary about tag-lm.&lt;br /&gt;
* read some paper about knowledge vector.&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      北京: 6 / 10 (BJYD test set's text contains 10 &amp;quot;北京&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;北京&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;北京&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        北京: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        北京: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;北京&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、北京市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
        训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
        jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
        jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
      120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-17T09:50:11Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result is as below&lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      北京: 6 / 10 (BJYD test set's text contains 10 &amp;quot;北京&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;北京&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;北京&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        北京: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        北京: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;北京&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、北京市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
        训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
        jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
        jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
      120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-17T01:21:00Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：/* Result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result in &lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      北京: 6 / 10 (BJYD test set's text contains 10 &amp;quot;北京&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;北京&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;北京&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        北京: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        北京: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        北京: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;北京&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、北京市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
        训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
        jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
        jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
      120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-17T01:19:48Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：/* Result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result in &lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      BeiJing: 6 / 10 (BJYD test set's text contains 10 &amp;quot;BeiJing&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;BeiJing&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;BeiJing&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        BeiJing: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        BeiJing: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        BeiJing: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        BeiJing: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;BeiJing&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、BeiJing市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
        训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
        jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
        jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
      120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-17T01:17:43Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result in &lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      BeiJing: 6 / 10 (BJYD test set's text contains 10 &amp;quot;BeiJing&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;BeiJing&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;BeiJing&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        BeiJing: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        BeiJing: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        BeiJing: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        BeiJing: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;BeiJing&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、BeiJing市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
	    训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
	    jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
	    jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
	  120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
      %SER 69.17 [ 166 / 240 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan</id>
		<title>11-16 Bin Yuan</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/11-16_Bin_Yuan"/>
				<updated>2014-11-17T01:16:55Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：以“=== Accomplished this week === * build a new jsgf file * construct a test set for address tag language model * conduct a new experiment, result in   === Planned for...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* build a new jsgf file&lt;br /&gt;
* construct a test set for address tag language model&lt;br /&gt;
* conduct a new experiment, result in &lt;br /&gt;
&lt;br /&gt;
=== Planned for next week ===&lt;br /&gt;
&lt;br /&gt;
=== Result===&lt;br /&gt;
1. experiment 1&lt;br /&gt;
&lt;br /&gt;
  1.1 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
    result:&lt;br /&gt;
      %WER 56.58 [ 8541 / 15096, 288 ins, 5075 del, 3178 sub ]&lt;br /&gt;
      %SER 93.20 [ 1096 / 1176 ]&lt;br /&gt;
      BeiJing: 6 / 10 (BJYD test set's text contains 10 &amp;quot;BeiJing&amp;quot;, decode 6 of 10)&lt;br /&gt;
&lt;br /&gt;
  1.2 use address tag:&lt;br /&gt;
    jsgf: extract top 500 frequent address(include &amp;quot;BeiJing&amp;quot;) from corpus&lt;br /&gt;
    corpus: BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt，remove sentences containing &amp;quot;BeiJing&amp;quot;, &lt;br /&gt;
      add tag to corpus(e.g. if &amp;quot;清华大学&amp;quot; is in jsgf and a sentence in corpus is &amp;quot;我 在 清华大学 上课&amp;quot;, &lt;br /&gt;
      then add a sentence &amp;quot;我 在 &amp;lt;address&amp;gt; 上课&amp;quot; to corpus) &lt;br /&gt;
    am: mdl_v3.0.S&lt;br /&gt;
    test set: test_BJYD&lt;br /&gt;
&lt;br /&gt;
    try different merge weight, the result is as follow:&lt;br /&gt;
      weight: 0.1 &lt;br /&gt;
        %WER 69.49 [ 10490 / 15096, 196 ins, 6016 del, 4278 sub ]&lt;br /&gt;
        %SER 94.98 [ 1117 / 1176 ]&lt;br /&gt;
        BeiJing: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 0.5&lt;br /&gt;
        %WER 62.23 [ 9394 / 15096, 190 ins, 5870 del, 3334 sub ]&lt;br /&gt;
        %SER 93.88 [ 1104 / 1176 ]&lt;br /&gt;
        BeiJing: 4 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 1&lt;br /&gt;
        %WER 58.03 [ 8760 / 15096, 243 ins, 5294 del, 3223 sub ]&lt;br /&gt;
        %SER 93.28 [ 1097 / 1176 ]&lt;br /&gt;
        BeiJing: 2 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 2&lt;br /&gt;
        %WER 56.90 [ 8589 / 15096, 344 ins, 4558 del, 3687 sub ]&lt;br /&gt;
        %SER 93.71 [ 1102 / 1176 ]&lt;br /&gt;
        BeiJing: 1 / 10&lt;br /&gt;
&lt;br /&gt;
      weight: 3&lt;br /&gt;
        can't decode &amp;quot;BeiJing&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-------------------------------------------------------------------------------&lt;br /&gt;
This weekend I find two mistakes in experiment 1:&lt;br /&gt;
    1. use run_decode.sh incorrectly. I copy this script from xiaoxi's directory to my own directory &lt;br /&gt;
      and run this script under my directory, leading to higher WER.&lt;br /&gt;
    2. one step of making merged lexicon fst is wrong(in experiment 1.2). Merging grammar_G.fst and lm_G.fst &lt;br /&gt;
      generates a new sym.txt and a new lexicon, the new sym.txt contains a &amp;quot;#0&amp;quot; at the end of the file, &lt;br /&gt;
      and format_lm.sh will use this sym.txt to generate a words.txt and add another &amp;quot;#0&amp;quot; to the end of words.txt,&lt;br /&gt;
      so there are two &amp;quot;#0&amp;quot; in words.txt, leading to wrong result. Under this condition, I find out when &lt;br /&gt;
      the decode result contains TAG, it would always be truncated. This explains why the deletion error is&lt;br /&gt;
      high when merge weight is small in experiment 1.2.&lt;br /&gt;
&lt;br /&gt;
2. experiment 2&lt;br /&gt;
  2.1 pre-work:&lt;br /&gt;
    2.1.1 build jsgf file&lt;br /&gt;
      extract a address list from corpus, sort and count the address list, and、 uniformly sample 490 address &lt;br /&gt;
      from the address which appears no more than 10 times in the corpus, finally add 10 address which does not&lt;br /&gt;
      appear in the corpus.&lt;br /&gt;
&lt;br /&gt;
      some samples of the 490 address:&lt;br /&gt;
        黑龙江省、宿迁市、安定门、吉林省 吉林市、芙蓉 西街、南三环 中路、朝阳 北路 大悦城、石门县&lt;br /&gt;
      some samples of the 10 address:&lt;br /&gt;
        上海市 浦东新区 陆家嘴、布鲁塞尔、阿姆斯特丹、圣马力诺、BeiJing市 海淀区 清华大学、明斯克、摩纳哥&lt;br /&gt;
&lt;br /&gt;
    2.1.2 construct a new test set named &amp;quot;test_address_tag&amp;quot;, some sample is as follow:&lt;br /&gt;
      测试集中120条文本包含的地名有三种情况：&lt;br /&gt;
	    训练预料中频繁的地名（出现次数大于10），不在jsgf当中（30条，按照地名在训练预料中出现的次数等间隔采样）   &lt;br /&gt;
	    jsgf中的第一种地名：在训练预料中出现次数小于10次（40条，按照地名在训练预料中出现的次数等间隔采样） &lt;br /&gt;
	    jsgf中的第二种地名：在训练预料中没出现过（50条，每个地名的测试样本5条）&lt;br /&gt;
	  120条文本每条录音两遍（不是同一个人），一共240个音频，12个人录音，每人录音20条&lt;br /&gt;
&lt;br /&gt;
  2.2 baseline&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    result:&lt;br /&gt;
    %WER 20.66 [ 848 / 4104, 189 ins, 354 del, 305 sub ]&lt;br /&gt;
    %SER 73.33 [ 176 / 240 ]&lt;br /&gt;
&lt;br /&gt;
  2.3 address tag&lt;br /&gt;
    corpus：BJYD2.txt, gxdx500h.txt, huawei_126h.txt, huawei_new.txt, and add tag to corpus&lt;br /&gt;
    am: mdl_1400&lt;br /&gt;
    test set: test_address_tag&lt;br /&gt;
    weight: 1&lt;br /&gt;
      %WER 15.98 [ 656 / 4104, 169 ins, 291 del, 196 sub ]&lt;br /&gt;
	  %SER 69.17 [ 166 / 240 ]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-11-16</id>
		<title>2014-11-16</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-11-16"/>
				<updated>2014-11-17T00:56:30Z</updated>
		
		<summary type="html">&lt;p&gt;Yuanb：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[11-16 Zhiyong Zhang]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Xiangyu Zeng]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Fanhu Bie]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Guoyu Tang]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Yiye Lin]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Dongxu Zhang]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Miao Fan]]&lt;br /&gt;
&lt;br /&gt;
[[11-16 Bin Yuan]]&lt;/div&gt;</summary>
		<author><name>Yuanb</name></author>	</entry>

	</feed>