“Search method”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
synonyms method
Lr讨论 | 贡献
lucene method
 
(相同用户的21个中间修订版本未显示)
第1行: 第1行:
==lucene method==
+
=learning to rank=
*data set
+
v1.0[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Huilan-learning-to-rank]
:* jiangkaipeng:
+
=MERT-4 Method=
* different method result
+
* [[Optimize the parameter in different data source]]
  
 +
=lucene method=
 +
*[[different method in lucene]]
 +
*[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Lucene lucene_multi_query]
 +
 +
= boost keyword =
 +
[[boost keyword before search with ITIDF]]
 +
 +
=our method=
 
{| border="2px"
 
{| border="2px"
 
|+ different result in lucene
 
|+ different result in lucene
 
|-
 
|-
! method !!Default !! BM25 !! LMDirichlet !! DFR !! LMJelinekMercer !! IB
+
! method !!lucene !! vsm_idf(haiguan) !! VSM_idf(baidu) !! vsm_idf(tain) !! vsm_idf(calculate)
 
|-
 
|-
 
! Accary
 
! Accary
| 0.66228 || 0.66228 || 0.4091 || 0.65476 || 0.65476 || 0.6666
+
| 0.6628 || 0.6228 || 0.6197 || 0.5827 || 0.5426
|-
+
|}
+
* add boost keyword
+
{| border="2px"
+
|+ boost keyword  in lucene
+
|-
+
! method !!Default  !! idf_train !! idf_train_norm!! idf_baidu !! idf_baidu_norm
+
|-
+
! Accary
+
| 0.66228 ||  0.651629 ||0.57644|| 0.647869|| 0.65288
+
 
|-
 
|-
 
|}
 
|}
  
==our method==
+
=synonyms method=
{| border="2px"
+
|+ different result in lucene
+
|-
+
! method !!lucene  !! BM25 !! VSM
+
|-
+
! Accary
+
| 0.6184 || 0.614 || 0.377
+
|-
+
|}
+
==synonyms method==
+
 
* fuzzy match
 
* fuzzy match
 +
:* calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
 +
* lucene
 +
:* lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/synonym/package-summary.html#package_description]) like :(a -> x)  (a b -> y) (b c d -> z) or extend the query.
 
:*
 
:*
* lucene
 
:* lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/synonym/package-summary.html#package_description])
 
:*  (a -> x)  (a b -> y) (b c d -> z)
 
  
==find==
+
=find=
 
* 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
 
* 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
 
* 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
 
* 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
 
* lucene4.6 已经增加了同义词拓展[http://www.hankcs.com/program/java/lucene-synonymfilterfactory.html]
 
* lucene4.6 已经增加了同义词拓展[http://www.hankcs.com/program/java/lucene-synonymfilterfactory.html]
 +
=bug fix=
 +
* vsm method
 +
:* doesn't clear the pattern before search

2015年2月3日 (二) 01:11的最后版本

learning to rank

v1.0[1]

MERT-4 Method

lucene method

boost keyword

boost keyword before search with ITIDF

our method

different result in lucene
method lucene vsm_idf(haiguan) VSM_idf(baidu) vsm_idf(tain) vsm_idf(calculate)
Accary 0.6628 0.6228 0.6197 0.5827 0.5426

synonyms method

  • fuzzy match
  • calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
  • lucene
  • lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[2]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.

find

  • 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
  • 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
  • lucene4.6 已经增加了同义词拓展[3]

bug fix

  • vsm method
  • doesn't clear the pattern before search