<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://index.cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://index.cslt.org/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Caoli</id>
		<title>cslt Wiki - 用户贡献 [zh-cn]</title>
		<link rel="self" type="application/atom+xml" href="http://index.cslt.org/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Caoli"/>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/%E7%89%B9%E6%AE%8A:%E7%94%A8%E6%88%B7%E8%B4%A1%E7%8C%AE/Caoli"/>
		<updated>2026-04-08T08:57:55Z</updated>
		<subtitle>用户贡献</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Li_Cao_14-12-14</id>
		<title>Li Cao 14-12-14</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Li_Cao_14-12-14"/>
				<updated>2014-12-15T01:09:59Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：以“=== Accomplished this week === *use the MERT method trained all kinds of argument .improved the test scores in lucene  *Test the 'COORD' value set the rate of accura...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
*use the MERT method trained all kinds of argument .improved the test scores in lucene &lt;br /&gt;
*Test the 'COORD' value set the rate of accuracy on the test&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
*Add the 'SPELL CHECK' to the system&lt;br /&gt;
*learn the 'queryAnalysis' function&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-14</id>
		<title>2014-12-14</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-14"/>
				<updated>2014-12-15T00:57:20Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Fanhu bie 14-12-14]]&lt;br /&gt;
&lt;br /&gt;
[[Rong Liu 14-12-14]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-12-14]]&lt;br /&gt;
&lt;br /&gt;
[[Li Cao 14-12-14]]&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Z-MERT</id>
		<title>Z-MERT</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Z-MERT"/>
				<updated>2014-12-09T10:17:04Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;*Test conditions&lt;br /&gt;
:*(../res/corpus/20141016凉山州/3文本/testJ.txt) about 1596 questions.&lt;br /&gt;
:*Only Lucene&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Test result=&lt;br /&gt;
{| border=&amp;quot;2px&amp;quot;&lt;br /&gt;
|+ different result in lucene&lt;br /&gt;
|-&lt;br /&gt;
! method !!baseline   !! new_index_template(1.0 1.0) !! by Z-MERT(1.8411, 1.0)  !! &lt;br /&gt;
|-&lt;br /&gt;
! Accary&lt;br /&gt;
| 0.662280 || 0.669799|| 0.678571         &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
note:above are only sq and pattern.&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Z-MERT</id>
		<title>Z-MERT</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Z-MERT"/>
				<updated>2014-12-09T10:14:51Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* Test result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;*Test conditions&lt;br /&gt;
:*(../res/corpus/20141016凉山州/3文本/testJ.txt)&lt;br /&gt;
:*Only Lucene&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Test result=&lt;br /&gt;
{| border=&amp;quot;2px&amp;quot;&lt;br /&gt;
|+ different result in lucene&lt;br /&gt;
|-&lt;br /&gt;
! method !!baseline   !! new_index_template !! by Z-MERT  !! &lt;br /&gt;
|-&lt;br /&gt;
! Accary&lt;br /&gt;
| 0.662280 || 0.669799|| 0.678571         &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Z-MERT</id>
		<title>Z-MERT</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Z-MERT"/>
				<updated>2014-12-09T10:12:34Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;*Test conditions&lt;br /&gt;
:*(../res/corpus/20141016凉山州/3文本/testJ.txt)&lt;br /&gt;
:*Only Lucene&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Test result=&lt;br /&gt;
{| border=&amp;quot;2px&amp;quot;&lt;br /&gt;
|+ different result in lucene&lt;br /&gt;
|-&lt;br /&gt;
! method !!baseline   !! new_index_template !! by Z-MERT  !! &lt;br /&gt;
|-&lt;br /&gt;
! Accary&lt;br /&gt;
| 0.669799 || 0.669799|| 0.678571         &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Z-MERT</id>
		<title>Z-MERT</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Z-MERT"/>
				<updated>2014-12-09T10:09:16Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;*Test conditions&lt;br /&gt;
:*(../res/corpus/20141016凉山州/3文本/testJ.txt)&lt;br /&gt;
:*Only Lucene&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*test result&lt;br /&gt;
--------|-------------------|-------------------|---------------------------|&lt;br /&gt;
        |                   |                   |                           |&lt;br /&gt;
        |    baseline       | new_index_template| change argument by Z-MERT |      &lt;br /&gt;
        |                   |                   |                           |&lt;br /&gt;
--------|-------------------|-------------------|---------------------------|&lt;br /&gt;
 correct|0.6697994987468672 | 0.6697994987468672|0.6785714285714286         |&lt;br /&gt;
--------|----------—--------|-------------------|---------------------------|&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field</id>
		<title>Multi query in multi field</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field"/>
				<updated>2014-12-09T07:27:46Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* test result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=check the detail of Lucene score=&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
&lt;br /&gt;
==搜索==&lt;br /&gt;
 query:&amp;quot;如何办理户口&amp;quot;  =&amp;gt; question:如何 question:办理户口&lt;br /&gt;
==result==&lt;br /&gt;
  doc=0 score=0.114656925 shardIndex=-1|0.114656925 = (MATCH) product of:&lt;br /&gt;
    0.22931385 = (MATCH) sum of:&lt;br /&gt;
      0.22931385 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.22931385 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
        0.4748871 = queryWeight, product of:&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.3687922 = queryNorm&lt;br /&gt;
        0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
          1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
            1.0 = termFreq=1.0&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.375 = fieldNorm(doc=0)&lt;br /&gt;
   0.5 = coord(1/2)&lt;br /&gt;
*详细计算流score(query,d0)&lt;br /&gt;
*参考公式：[http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html]&lt;br /&gt;
 [[文件:QQ截图20141128164958.png]]&lt;br /&gt;
:* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
:* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
:* 如何&amp;quot;.getboost=1&lt;br /&gt;
:* coord(如何，d0) : 0.5 = coord(1/2) &lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
:* queryNorm(q)= 1/sqrt(sumOfSquaredWeights)=1/sqrt(sum(idf(&amp;quot;如何&amp;quot;)*1+idf(&amp;quot;办理户口&amp;quot;)))=1/sqrt(1*(1.287682*1.287682+2.386*2.386))=0.3687.&lt;br /&gt;
   sumOfSquaredWeights   =   q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2&lt;br /&gt;
&lt;br /&gt;
=mutli =&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
==搜索==&lt;br /&gt;
code&lt;br /&gt;
  BooleanQuery query = new BooleanQuery();&lt;br /&gt;
  query.add(paternQuery, Occur.MUST); // or Occur.SHOULD if this clause is optional&lt;br /&gt;
  query.add(ansQuery, Occur.SHOULD); // or Occur.MUST if this clause is required&lt;br /&gt;
  query.add(sqQuery, Occur.SHOULD);   &lt;br /&gt;
search: &lt;br /&gt;
   +((question:如何 question:办理户口)^0.8) ((answer:如何 answer:办理户口)^0.2) ((standardq:如何 standardq:办理户口)^0.2)&lt;br /&gt;
&lt;br /&gt;
==result==&lt;br /&gt;
* 计算公式&lt;br /&gt;
:* score(Q)=score(q_PTN)+score(q_ANS)+score(q_STD)&lt;br /&gt;
:* querynorm(Q),Q=q_PTN+q_ANS+q_STD&lt;br /&gt;
::* sumOfSquaredWeights   =  ∑{q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2},q={q_PTN , q_STD, q_ANS}&lt;br /&gt;
::*  queryNorm(Q)= 1/sqrt(sumOfSquaredWeights)&lt;br /&gt;
:* field patern&lt;br /&gt;
::* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
::* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
::* 如何&amp;quot;.getboost=1&lt;br /&gt;
::* coord(如何，d0) : 0.5 = coord(1/2)&lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
::* queryNorm(q_PTN)=querynorm(Q)*boost(q_PTN)&lt;br /&gt;
::* Norm&lt;br /&gt;
:* &lt;br /&gt;
*detail&lt;br /&gt;
:* filed: answer+pattern&lt;br /&gt;
     score(q,filed-pattern)+score(q,filed-answer)&lt;br /&gt;
     &lt;br /&gt;
  doc=0 score=0.15459718 shardIndex=-1|0.1545972 = (MATCH) product of:&lt;br /&gt;
  0.23189577 = (MATCH) sum of:[all]&lt;br /&gt;
    0.108532876 = (MATCH) product of:[filed:pattern]&lt;br /&gt;
      0.21706575 = (MATCH) sum of:&lt;br /&gt;
        0.21706575 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
          0.21706575 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
            0.44952247 = queryWeight, product of:&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.3490943 = queryNorm&lt;br /&gt;
            0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
              1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
                1.0 = termFreq=1.0&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.375 = fieldNorm(doc=0)&lt;br /&gt;
      0.5 = coord(1/2)&lt;br /&gt;
    0.12336289 = (MATCH) sum of:[field:answer]&lt;br /&gt;
      0.032918826 = (MATCH) weight(answer:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.032918826 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.06779904 = queryWeight, product of:&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.48553526 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
      0.090444066 = (MATCH) weight(answer:办理户口 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.090444066 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.11238062 = queryWeight, product of:&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.8048013 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
  0.6666667 = coord(2/3)&lt;br /&gt;
&lt;br /&gt;
=Z-MERT test result=&lt;br /&gt;
[Z-MERT]&lt;br /&gt;
*Test conditions&lt;br /&gt;
:*(../res/corpus/20141016凉山州/3文本/testJ.txt)&lt;br /&gt;
:*Only Lucene&lt;br /&gt;
&lt;br /&gt;
*The default argument(patern:1.0 sq:1.0)&lt;br /&gt;
:*test result:0.6697994987468672&lt;br /&gt;
*Use MERT method and get the argument(patern:1.811676798378926, sq:1.0)&lt;br /&gt;
:*test result:0.6779448621553885&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field</id>
		<title>Multi query in multi field</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field"/>
				<updated>2014-12-09T07:17:50Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* test result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=check the detail of Lucene score=&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
&lt;br /&gt;
==搜索==&lt;br /&gt;
 query:&amp;quot;如何办理户口&amp;quot;  =&amp;gt; question:如何 question:办理户口&lt;br /&gt;
==result==&lt;br /&gt;
  doc=0 score=0.114656925 shardIndex=-1|0.114656925 = (MATCH) product of:&lt;br /&gt;
    0.22931385 = (MATCH) sum of:&lt;br /&gt;
      0.22931385 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.22931385 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
        0.4748871 = queryWeight, product of:&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.3687922 = queryNorm&lt;br /&gt;
        0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
          1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
            1.0 = termFreq=1.0&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.375 = fieldNorm(doc=0)&lt;br /&gt;
   0.5 = coord(1/2)&lt;br /&gt;
*详细计算流score(query,d0)&lt;br /&gt;
*参考公式：[http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html]&lt;br /&gt;
 [[文件:QQ截图20141128164958.png]]&lt;br /&gt;
:* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
:* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
:* 如何&amp;quot;.getboost=1&lt;br /&gt;
:* coord(如何，d0) : 0.5 = coord(1/2) &lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
:* queryNorm(q)= 1/sqrt(sumOfSquaredWeights)=1/sqrt(sum(idf(&amp;quot;如何&amp;quot;)*1+idf(&amp;quot;办理户口&amp;quot;)))=1/sqrt(1*(1.287682*1.287682+2.386*2.386))=0.3687.&lt;br /&gt;
   sumOfSquaredWeights   =   q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2&lt;br /&gt;
&lt;br /&gt;
=mutli =&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
==搜索==&lt;br /&gt;
code&lt;br /&gt;
  BooleanQuery query = new BooleanQuery();&lt;br /&gt;
  query.add(paternQuery, Occur.MUST); // or Occur.SHOULD if this clause is optional&lt;br /&gt;
  query.add(ansQuery, Occur.SHOULD); // or Occur.MUST if this clause is required&lt;br /&gt;
  query.add(sqQuery, Occur.SHOULD);   &lt;br /&gt;
search: &lt;br /&gt;
   +((question:如何 question:办理户口)^0.8) ((answer:如何 answer:办理户口)^0.2) ((standardq:如何 standardq:办理户口)^0.2)&lt;br /&gt;
&lt;br /&gt;
==result==&lt;br /&gt;
* 计算公式&lt;br /&gt;
:* score(Q)=score(q_PTN)+score(q_ANS)+score(q_STD)&lt;br /&gt;
:* querynorm(Q),Q=q_PTN+q_ANS+q_STD&lt;br /&gt;
::* sumOfSquaredWeights   =  ∑{q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2},q={q_PTN , q_STD, q_ANS}&lt;br /&gt;
::*  queryNorm(Q)= 1/sqrt(sumOfSquaredWeights)&lt;br /&gt;
:* field patern&lt;br /&gt;
::* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
::* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
::* 如何&amp;quot;.getboost=1&lt;br /&gt;
::* coord(如何，d0) : 0.5 = coord(1/2)&lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
::* queryNorm(q_PTN)=querynorm(Q)*boost(q_PTN)&lt;br /&gt;
::* Norm&lt;br /&gt;
:* &lt;br /&gt;
*detail&lt;br /&gt;
:* filed: answer+pattern&lt;br /&gt;
     score(q,filed-pattern)+score(q,filed-answer)&lt;br /&gt;
     &lt;br /&gt;
  doc=0 score=0.15459718 shardIndex=-1|0.1545972 = (MATCH) product of:&lt;br /&gt;
  0.23189577 = (MATCH) sum of:[all]&lt;br /&gt;
    0.108532876 = (MATCH) product of:[filed:pattern]&lt;br /&gt;
      0.21706575 = (MATCH) sum of:&lt;br /&gt;
        0.21706575 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
          0.21706575 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
            0.44952247 = queryWeight, product of:&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.3490943 = queryNorm&lt;br /&gt;
            0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
              1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
                1.0 = termFreq=1.0&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.375 = fieldNorm(doc=0)&lt;br /&gt;
      0.5 = coord(1/2)&lt;br /&gt;
    0.12336289 = (MATCH) sum of:[field:answer]&lt;br /&gt;
      0.032918826 = (MATCH) weight(answer:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.032918826 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.06779904 = queryWeight, product of:&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.48553526 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
      0.090444066 = (MATCH) weight(answer:办理户口 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.090444066 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.11238062 = queryWeight, product of:&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.8048013 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
  0.6666667 = coord(2/3)&lt;br /&gt;
&lt;br /&gt;
=test result=&lt;br /&gt;
[Z-MERT]&lt;br /&gt;
*Test conditions&lt;br /&gt;
:*(../res/corpus/20141016凉山州/3文本/testJ.txt)&lt;br /&gt;
:*Only Lucene&lt;br /&gt;
&lt;br /&gt;
*The default argument(patern:1.0 sq:1.0)&lt;br /&gt;
:*test result:0.6697994987468672&lt;br /&gt;
*Use MERT method and get the argument(patern:1.811676798378926, sq:1.0)&lt;br /&gt;
:*test result:0.6779448621553885&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field</id>
		<title>Multi query in multi field</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field"/>
				<updated>2014-12-09T07:13:39Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* test result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=check the detail of Lucene score=&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
&lt;br /&gt;
==搜索==&lt;br /&gt;
 query:&amp;quot;如何办理户口&amp;quot;  =&amp;gt; question:如何 question:办理户口&lt;br /&gt;
==result==&lt;br /&gt;
  doc=0 score=0.114656925 shardIndex=-1|0.114656925 = (MATCH) product of:&lt;br /&gt;
    0.22931385 = (MATCH) sum of:&lt;br /&gt;
      0.22931385 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.22931385 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
        0.4748871 = queryWeight, product of:&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.3687922 = queryNorm&lt;br /&gt;
        0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
          1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
            1.0 = termFreq=1.0&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.375 = fieldNorm(doc=0)&lt;br /&gt;
   0.5 = coord(1/2)&lt;br /&gt;
*详细计算流score(query,d0)&lt;br /&gt;
*参考公式：[http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html]&lt;br /&gt;
 [[文件:QQ截图20141128164958.png]]&lt;br /&gt;
:* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
:* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
:* 如何&amp;quot;.getboost=1&lt;br /&gt;
:* coord(如何，d0) : 0.5 = coord(1/2) &lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
:* queryNorm(q)= 1/sqrt(sumOfSquaredWeights)=1/sqrt(sum(idf(&amp;quot;如何&amp;quot;)*1+idf(&amp;quot;办理户口&amp;quot;)))=1/sqrt(1*(1.287682*1.287682+2.386*2.386))=0.3687.&lt;br /&gt;
   sumOfSquaredWeights   =   q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2&lt;br /&gt;
&lt;br /&gt;
=mutli =&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
==搜索==&lt;br /&gt;
code&lt;br /&gt;
  BooleanQuery query = new BooleanQuery();&lt;br /&gt;
  query.add(paternQuery, Occur.MUST); // or Occur.SHOULD if this clause is optional&lt;br /&gt;
  query.add(ansQuery, Occur.SHOULD); // or Occur.MUST if this clause is required&lt;br /&gt;
  query.add(sqQuery, Occur.SHOULD);   &lt;br /&gt;
search: &lt;br /&gt;
   +((question:如何 question:办理户口)^0.8) ((answer:如何 answer:办理户口)^0.2) ((standardq:如何 standardq:办理户口)^0.2)&lt;br /&gt;
&lt;br /&gt;
==result==&lt;br /&gt;
* 计算公式&lt;br /&gt;
:* score(Q)=score(q_PTN)+score(q_ANS)+score(q_STD)&lt;br /&gt;
:* querynorm(Q),Q=q_PTN+q_ANS+q_STD&lt;br /&gt;
::* sumOfSquaredWeights   =  ∑{q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2},q={q_PTN , q_STD, q_ANS}&lt;br /&gt;
::*  queryNorm(Q)= 1/sqrt(sumOfSquaredWeights)&lt;br /&gt;
:* field patern&lt;br /&gt;
::* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
::* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
::* 如何&amp;quot;.getboost=1&lt;br /&gt;
::* coord(如何，d0) : 0.5 = coord(1/2)&lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
::* queryNorm(q_PTN)=querynorm(Q)*boost(q_PTN)&lt;br /&gt;
::* Norm&lt;br /&gt;
:* &lt;br /&gt;
*detail&lt;br /&gt;
:* filed: answer+pattern&lt;br /&gt;
     score(q,filed-pattern)+score(q,filed-answer)&lt;br /&gt;
     &lt;br /&gt;
  doc=0 score=0.15459718 shardIndex=-1|0.1545972 = (MATCH) product of:&lt;br /&gt;
  0.23189577 = (MATCH) sum of:[all]&lt;br /&gt;
    0.108532876 = (MATCH) product of:[filed:pattern]&lt;br /&gt;
      0.21706575 = (MATCH) sum of:&lt;br /&gt;
        0.21706575 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
          0.21706575 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
            0.44952247 = queryWeight, product of:&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.3490943 = queryNorm&lt;br /&gt;
            0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
              1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
                1.0 = termFreq=1.0&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.375 = fieldNorm(doc=0)&lt;br /&gt;
      0.5 = coord(1/2)&lt;br /&gt;
    0.12336289 = (MATCH) sum of:[field:answer]&lt;br /&gt;
      0.032918826 = (MATCH) weight(answer:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.032918826 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.06779904 = queryWeight, product of:&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.48553526 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
      0.090444066 = (MATCH) weight(answer:办理户口 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.090444066 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.11238062 = queryWeight, product of:&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.8048013 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
  0.6666667 = coord(2/3)&lt;br /&gt;
&lt;br /&gt;
=test result=&lt;br /&gt;
[Z-MERT]&lt;br /&gt;
*The default argument(patern:1.0 sq:1.0)&lt;br /&gt;
:*test result:0.6697994987468672&lt;br /&gt;
*Use MERT method and get the argument(patern:1.811676798378926, sq:1.0)&lt;br /&gt;
:*test result:0.6779448621553885&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field</id>
		<title>Multi query in multi field</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Multi_query_in_multi_field"/>
				<updated>2014-12-09T07:05:38Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* test result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=check the detail of Lucene score=&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
&lt;br /&gt;
==搜索==&lt;br /&gt;
 query:&amp;quot;如何办理户口&amp;quot;  =&amp;gt; question:如何 question:办理户口&lt;br /&gt;
==result==&lt;br /&gt;
  doc=0 score=0.114656925 shardIndex=-1|0.114656925 = (MATCH) product of:&lt;br /&gt;
    0.22931385 = (MATCH) sum of:&lt;br /&gt;
      0.22931385 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.22931385 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
        0.4748871 = queryWeight, product of:&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.3687922 = queryNorm&lt;br /&gt;
        0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
          1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
            1.0 = termFreq=1.0&lt;br /&gt;
          1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
          0.375 = fieldNorm(doc=0)&lt;br /&gt;
   0.5 = coord(1/2)&lt;br /&gt;
*详细计算流score(query,d0)&lt;br /&gt;
*参考公式：[http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html]&lt;br /&gt;
 [[文件:QQ截图20141128164958.png]]&lt;br /&gt;
:* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
:* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
:* 如何&amp;quot;.getboost=1&lt;br /&gt;
:* coord(如何，d0) : 0.5 = coord(1/2) &lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
:* queryNorm(q)= 1/sqrt(sumOfSquaredWeights)=1/sqrt(sum(idf(&amp;quot;如何&amp;quot;)*1+idf(&amp;quot;办理户口&amp;quot;)))=1/sqrt(1*(1.287682*1.287682+2.386*2.386))=0.3687.&lt;br /&gt;
   sumOfSquaredWeights   =   q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2&lt;br /&gt;
&lt;br /&gt;
=mutli =&lt;br /&gt;
==data==&lt;br /&gt;
  d0 [{如何，怎么}} {办理，办} {户口，户口本} # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d1 {办理，办} {户口，户口本} [{流程，步骤}] # 到当地派出所办理  # 如何办理户口&lt;br /&gt;
  d2 [{如何，怎么}} {办理，办} {身份证，身份} # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
  d3 {办理，办} {身份证} [{流程，步骤}] # 到当地派出所办理  # 如何办理身份证&lt;br /&gt;
==搜索==&lt;br /&gt;
code&lt;br /&gt;
  BooleanQuery query = new BooleanQuery();&lt;br /&gt;
  query.add(paternQuery, Occur.MUST); // or Occur.SHOULD if this clause is optional&lt;br /&gt;
  query.add(ansQuery, Occur.SHOULD); // or Occur.MUST if this clause is required&lt;br /&gt;
  query.add(sqQuery, Occur.SHOULD);   &lt;br /&gt;
search: &lt;br /&gt;
   +((question:如何 question:办理户口)^0.8) ((answer:如何 answer:办理户口)^0.2) ((standardq:如何 standardq:办理户口)^0.2)&lt;br /&gt;
&lt;br /&gt;
==result==&lt;br /&gt;
* 计算公式&lt;br /&gt;
:* score(Q)=score(q_PTN)+score(q_ANS)+score(q_STD)&lt;br /&gt;
:* querynorm(Q),Q=q_PTN+q_ANS+q_STD&lt;br /&gt;
::* sumOfSquaredWeights   =  ∑{q.getBoost()*q.getBoost()*∑( idf(t) *t.getBoost() )^2},q={q_PTN , q_STD, q_ANS}&lt;br /&gt;
::*  queryNorm(Q)= 1/sqrt(sumOfSquaredWeights)&lt;br /&gt;
:* field patern&lt;br /&gt;
::* tf(&amp;quot;如何&amp;quot; in d0)=sqrt{frequency}= sqrt{1}=1&lt;br /&gt;
::* idf(&amp;quot;如何&amp;quot;)=&amp;lt;math&amp;gt;1+ln( {numDocs}/{docFreq+1})=1+ln( {4}/{2+1} )&lt;br /&gt;
::* 如何&amp;quot;.getboost=1&lt;br /&gt;
::* coord(如何，d0) : 0.5 = coord(1/2)&lt;br /&gt;
    coord(t,d)=overlap  /maxOverlap .&lt;br /&gt;
    overlap - the number of query terms matched in the document&lt;br /&gt;
    maxOverlap - the total number of terms in the query&lt;br /&gt;
::* queryNorm(q_PTN)=querynorm(Q)*boost(q_PTN)&lt;br /&gt;
::* Norm&lt;br /&gt;
:* &lt;br /&gt;
*detail&lt;br /&gt;
:* filed: answer+pattern&lt;br /&gt;
     score(q,filed-pattern)+score(q,filed-answer)&lt;br /&gt;
     &lt;br /&gt;
  doc=0 score=0.15459718 shardIndex=-1|0.1545972 = (MATCH) product of:&lt;br /&gt;
  0.23189577 = (MATCH) sum of:[all]&lt;br /&gt;
    0.108532876 = (MATCH) product of:[filed:pattern]&lt;br /&gt;
      0.21706575 = (MATCH) sum of:&lt;br /&gt;
        0.21706575 = (MATCH) weight(question:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
          0.21706575 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
            0.44952247 = queryWeight, product of:&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.3490943 = queryNorm&lt;br /&gt;
            0.48288077 = fieldWeight in 0, product of:&lt;br /&gt;
              1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
                1.0 = termFreq=1.0&lt;br /&gt;
              1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
              0.375 = fieldNorm(doc=0)&lt;br /&gt;
      0.5 = coord(1/2)&lt;br /&gt;
    0.12336289 = (MATCH) sum of:[field:answer]&lt;br /&gt;
      0.032918826 = (MATCH) weight(answer:如何 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.032918826 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.06779904 = queryWeight, product of:&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.48553526 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            0.7768564 = idf(docFreq=4, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
      0.090444066 = (MATCH) weight(answer:办理户口 in 0) [DefaultSimilarity], result of:&lt;br /&gt;
        0.090444066 = score(doc=0,freq=1.0 = termFreq=1.0&lt;br /&gt;
  ), product of:&lt;br /&gt;
          0.11238062 = queryWeight, product of:&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.087273575 = queryNorm&lt;br /&gt;
          0.8048013 = fieldWeight in 0, product of:&lt;br /&gt;
            1.0 = tf(freq=1.0), with freq of:&lt;br /&gt;
              1.0 = termFreq=1.0&lt;br /&gt;
            1.287682 = idf(docFreq=2, maxDocs=4)&lt;br /&gt;
            0.625 = fieldNorm(doc=0)&lt;br /&gt;
  0.6666667 = coord(2/3)&lt;br /&gt;
&lt;br /&gt;
=test result=&lt;br /&gt;
[z-mert]&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Li_Cao_14-12-07</id>
		<title>Li Cao 14-12-07</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Li_Cao_14-12-07"/>
				<updated>2014-12-08T02:11:18Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：以“=== Accomplished this week === * Understand the Minimum Error Rate Training in Lucene.  * Read several paper about MERT === Plan for next week === * according to the...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Accomplished this week ===&lt;br /&gt;
* Understand the Minimum Error Rate Training in Lucene. &lt;br /&gt;
* Read several paper about MERT&lt;br /&gt;
=== Plan for next week ===&lt;br /&gt;
* according to the MERT method.test and record the result.&lt;br /&gt;
* Read the papers about the Mert&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-12-07</id>
		<title>2014-12-07</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-12-07"/>
				<updated>2014-12-08T01:26:07Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Xiaoxi Wang 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Dongxu Zhang 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Xiangyu Zeng 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Miao Fan 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Bin Yuan 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Mengyuan Zhao 14-12-07]]&lt;br /&gt;
&lt;br /&gt;
[[Li Cao 14-12-07]]&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-11-19</id>
		<title>2014-11-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-11-19"/>
				<updated>2014-11-19T12:06:51Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* 原因 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;  拼写检查功能模块的测试报告如下:&lt;br /&gt;
  author CaoLi   date:2014 11.19&lt;br /&gt;
=建立测试集=&lt;br /&gt;
首先对测试集进行手动改错业务词后再自动分词,进行测试.条数:200条.&lt;br /&gt;
例如：&lt;br /&gt;
手动改错业务词：&lt;br /&gt;
   申请班里高领老人紧贴变更和终止的实现&lt;br /&gt;
原句自动分词后：&lt;br /&gt;
   申请 班里 高领 老人 紧贴 变更 和 终止 的 实现&lt;br /&gt;
测试集为:测试集(.\corpus\20141016凉山州\3文本\testJ.txt)前200条,注意是只取每一条对应的问题.&lt;br /&gt;
=评价=&lt;br /&gt;
测试结果的评价标准:&lt;br /&gt;
   正确率=正确识别出需要修改的个体总数/识别出需要修改的个体总数&lt;br /&gt;
   召回率=正确识别出需要修改的个体总数/测试集中存在的需要修改的个体总数&lt;br /&gt;
   准确率=修改对的个体总数/个体总数&lt;br /&gt;
例如:&lt;br /&gt;
正确:&lt;br /&gt;
  我 真 想 办理 身份证 呀. &lt;br /&gt;
测试用例: &lt;br /&gt;
  我 挣 像 办理 神风证 压. &lt;br /&gt;
结果:&lt;br /&gt;
  我 证 想 班里 身份证 压. &lt;br /&gt;
&lt;br /&gt;
动作:&lt;br /&gt;
  我-&amp;gt;我(correct) 像-&amp;gt;想（correct） 办理-&amp;gt;班里（false） 神风证-&amp;gt;身份证(correct) 挣-&amp;gt;证(false) 压-&amp;gt;压(false) &lt;br /&gt;
评价：&lt;br /&gt;
  需要修改: 正确率=3/4. 召回率=3/4. &lt;br /&gt;
  不要修改：正确率=1/2. 召回率=1/2. &lt;br /&gt;
  准确率:3/6&lt;br /&gt;
&lt;br /&gt;
=测试结果=&lt;br /&gt;
1.使用的语言模型:使用训练集&amp;lt;凉山州政务知识训练集1016.xls&amp;gt;中的&amp;lt;标准问题 答案&amp;gt;训练的3-gram语言模型.（详细结果见test-model-RESULT.txt）&lt;br /&gt;
&lt;br /&gt;
RESULT: &lt;br /&gt;
  需要修改:正确率:498/498 = 1.0          召回率: 498/881 = 0.565266&lt;br /&gt;
  不要修改:正确率:2228/2611 = 0.853313   召回率:  2228/2228 = 1.0 &lt;br /&gt;
  准确率 :2678/3109 = 0.861370&lt;br /&gt;
    &lt;br /&gt;
=结果分析=&lt;br /&gt;
&lt;br /&gt;
根据上面的结果发现召回率较低，&lt;br /&gt;
&lt;br /&gt;
==原因==&lt;br /&gt;
&lt;br /&gt;
可能的原因为：由于是先手动改错业务词再根据词表自动分词的。故系统有可能将一个业务词分成了好几个词。&lt;br /&gt;
&lt;br /&gt;
例如：&lt;br /&gt;
 [汝, 河, 进行, 开发商, 新建, 房产, 权, 等级]&lt;br /&gt;
过程：&lt;br /&gt;
 [汝, 河, 进行, 开发商, 新建, 房产, 权, 登机]'score is:29.822336867451668&lt;br /&gt;
 [汝, 河, 进行, 开发商, 新建, 房产, 权, 等级]'score is:29.208215907216072&lt;br /&gt;
 [汝, 河, 进行, 开发商, 新建, 房产, 权, 登记]'score is:27.493204072117805&lt;br /&gt;
 [汝, 河, 进行, 开发商, 新建, 房产, 权, 登基]'score is:29.822336867451668&lt;br /&gt;
test result:汝 河 进 行 开 发 商 新 建 房 产 权 登 记 &lt;br /&gt;
&lt;br /&gt;
分析：&lt;br /&gt;
由于上面将“汝河”分成了“汝”，“河”两个词,系统就不会对词“汝河”进行重新组合并打分。&lt;br /&gt;
&lt;br /&gt;
把改错的业务词分开的所占的比重：44/98=0.448979&lt;br /&gt;
&lt;br /&gt;
例如：&lt;br /&gt;
架势证 ------架势  证&lt;br /&gt;
&lt;br /&gt;
==改进==&lt;br /&gt;
&lt;br /&gt;
可能的改进方法：&lt;br /&gt;
&lt;br /&gt;
我们可以用拼音进行分词，但目前还未采取那样做。&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/2014-11-19</id>
		<title>2014-11-19</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/2014-11-19"/>
				<updated>2014-11-19T12:06:02Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：以“  拼写检查功能模块的测试报告如下:   author CaoLi   date:2014 11.19 =建立测试集= 首先对测试集进行手动改错业务词后再自动分词,...”为内容创建页面&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;  拼写检查功能模块的测试报告如下:&lt;br /&gt;
  author CaoLi   date:2014 11.19&lt;br /&gt;
=建立测试集=&lt;br /&gt;
首先对测试集进行手动改错业务词后再自动分词,进行测试.条数:200条.&lt;br /&gt;
例如：&lt;br /&gt;
手动改错业务词：&lt;br /&gt;
   申请班里高领老人紧贴变更和终止的实现&lt;br /&gt;
原句自动分词后：&lt;br /&gt;
   申请 班里 高领 老人 紧贴 变更 和 终止 的 实现&lt;br /&gt;
测试集为:测试集(.\corpus\20141016凉山州\3文本\testJ.txt)前200条,注意是只取每一条对应的问题.&lt;br /&gt;
=评价=&lt;br /&gt;
测试结果的评价标准:&lt;br /&gt;
   正确率=正确识别出需要修改的个体总数/识别出需要修改的个体总数&lt;br /&gt;
   召回率=正确识别出需要修改的个体总数/测试集中存在的需要修改的个体总数&lt;br /&gt;
   准确率=修改对的个体总数/个体总数&lt;br /&gt;
例如:&lt;br /&gt;
正确:&lt;br /&gt;
  我 真 想 办理 身份证 呀. &lt;br /&gt;
测试用例: &lt;br /&gt;
  我 挣 像 办理 神风证 压. &lt;br /&gt;
结果:&lt;br /&gt;
  我 证 想 班里 身份证 压. &lt;br /&gt;
&lt;br /&gt;
动作:&lt;br /&gt;
  我-&amp;gt;我(correct) 像-&amp;gt;想（correct） 办理-&amp;gt;班里（false） 神风证-&amp;gt;身份证(correct) 挣-&amp;gt;证(false) 压-&amp;gt;压(false) &lt;br /&gt;
评价：&lt;br /&gt;
  需要修改: 正确率=3/4. 召回率=3/4. &lt;br /&gt;
  不要修改：正确率=1/2. 召回率=1/2. &lt;br /&gt;
  准确率:3/6&lt;br /&gt;
&lt;br /&gt;
=测试结果=&lt;br /&gt;
1.使用的语言模型:使用训练集&amp;lt;凉山州政务知识训练集1016.xls&amp;gt;中的&amp;lt;标准问题 答案&amp;gt;训练的3-gram语言模型.（详细结果见test-model-RESULT.txt）&lt;br /&gt;
&lt;br /&gt;
RESULT: &lt;br /&gt;
  需要修改:正确率:498/498 = 1.0          召回率: 498/881 = 0.565266&lt;br /&gt;
  不要修改:正确率:2228/2611 = 0.853313   召回率:  2228/2228 = 1.0 &lt;br /&gt;
  准确率 :2678/3109 = 0.861370&lt;br /&gt;
    &lt;br /&gt;
=结果分析=&lt;br /&gt;
&lt;br /&gt;
根据上面的结果发现召回率较低，&lt;br /&gt;
&lt;br /&gt;
==原因==&lt;br /&gt;
&lt;br /&gt;
可能的原因为：由于是先手动改错业务词再根据词表自动分词的。故系统有可能将一个业务词分成了好几个词。&lt;br /&gt;
&lt;br /&gt;
例如：&lt;br /&gt;
 [汝, 河, 进行, 开发商, 新建, 房产, 权, 等级]&lt;br /&gt;
过程：&lt;br /&gt;
[汝, 河, 进行, 开发商, 新建, 房产, 权, 登机]'score is:29.822336867451668&lt;br /&gt;
[汝, 河, 进行, 开发商, 新建, 房产, 权, 等级]'score is:29.208215907216072&lt;br /&gt;
[汝, 河, 进行, 开发商, 新建, 房产, 权, 登记]'score is:27.493204072117805&lt;br /&gt;
[汝, 河, 进行, 开发商, 新建, 房产, 权, 登基]'score is:29.822336867451668&lt;br /&gt;
test result:汝 河 进 行 开 发 商 新 建 房 产 权 登 记 &lt;br /&gt;
&lt;br /&gt;
分析：&lt;br /&gt;
由于上面将“汝河”分成了“汝”，“河”两个词,系统就不会对词“汝河”进行重新组合并打分。&lt;br /&gt;
&lt;br /&gt;
把改错的业务词分开的所占的比重：44/98=0.448979&lt;br /&gt;
&lt;br /&gt;
例如：&lt;br /&gt;
架势证 ------架势  证&lt;br /&gt;
&lt;br /&gt;
==改进==&lt;br /&gt;
&lt;br /&gt;
可能的改进方法：&lt;br /&gt;
&lt;br /&gt;
我们可以用拼音进行分词，但目前还未采取那样做。&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	<entry>
		<id>http://index.cslt.org/mediawiki/index.php/Spell_check</id>
		<title>Spell check</title>
		<link rel="alternate" type="text/html" href="http://index.cslt.org/mediawiki/index.php/Spell_check"/>
				<updated>2014-11-19T11:00:15Z</updated>
		
		<summary type="html">&lt;p&gt;Caoli：/* result */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==评价标准==&lt;br /&gt;
拼写检查的评价标准:&lt;br /&gt;
&lt;br /&gt;
正确率=正确识别出需要修改的个体总数 / 识别出需要修改的个体总数.&lt;br /&gt;
&lt;br /&gt;
召回率=正确识别出需要修改的个体总数 / 测试集中存在的需要修改的个体总数.&lt;br /&gt;
&lt;br /&gt;
准确率 = 修改对的个体总数/个体总数&lt;br /&gt;
&lt;br /&gt;
注:正确识别的个体为拼写检查正确的个数,识别出的个体总数为所有进行拼写检查动作的总数.&lt;br /&gt;
&lt;br /&gt;
举例:&lt;br /&gt;
&lt;br /&gt;
正确:我 真 想 办理 身份证 呀. 测试用例: 我 挣 像 办理 神风证 压. 结果:我 证 想 班里 身份证 压.&lt;br /&gt;
&lt;br /&gt;
动作:我-&amp;gt;我(correct) 像-&amp;gt;想（correct） 办理-&amp;gt;班里（false） 神风证-&amp;gt;身份证(correct) 挣-&amp;gt;证(false)  压-&amp;gt;压(false)&lt;br /&gt;
&lt;br /&gt;
需要修改: 正确率=3/4. 召回率=3/4.&lt;br /&gt;
&lt;br /&gt;
不要修改：正确率=1/2. 召回率=1/2.&lt;br /&gt;
&lt;br /&gt;
准确率:3/6&lt;br /&gt;
==some source==&lt;br /&gt;
* some algorithms of spelling correction [http://www.quora.com/What-are-some-algorithms-of-spelling-correction-that-were-used-by-search-engine][https://documentation.devexpress.com/#WindowsForms/CustomDocument2989]&lt;br /&gt;
* How to Write a Spelling Corrector [http://norvig.com/spell-correct.html]&lt;br /&gt;
*&lt;br /&gt;
==result==&lt;br /&gt;
[[2014-11-18]]&lt;br /&gt;
&lt;br /&gt;
[[2014-11-19]]&lt;/div&gt;</summary>
		<author><name>Caoli</name></author>	</entry>

	</feed>