“Search method”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Lr讨论 | 贡献
第1行: 第1行:
==MERT-4 Method==
+
=MERT-4 Method=
==lucene method==
+
=lucene method=
 
*data set
 
*data set
 
:* jiangkaipeng:
 
:* jiangkaipeng:
第14行: 第14行:
 
|-
 
|-
 
|}
 
|}
== boost keyword ==
+
= boost keyword =
 
* boost the query keyword using IDF
 
* boost the query keyword using IDF
 
{| border="2px"
 
{| border="2px"
第30行: 第30行:
 
* add the new keyword value from proMe method
 
* add the new keyword value from proMe method
  
==our method==
+
=our method=
 
{| border="2px"
 
{| border="2px"
 
|+ different result in lucene
 
|+ different result in lucene
第41行: 第41行:
 
|}
 
|}
  
==synonyms method==
+
=synonyms method=
 
* fuzzy match
 
* fuzzy match
 
:* calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
 
:* calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
第48行: 第48行:
 
:*
 
:*
  
==find==
+
=find=
 
* 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
 
* 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
 
* 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
 
* 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
 
* lucene4.6 已经增加了同义词拓展[http://www.hankcs.com/program/java/lucene-synonymfilterfactory.html]
 
* lucene4.6 已经增加了同义词拓展[http://www.hankcs.com/program/java/lucene-synonymfilterfactory.html]
==bug fix==
+
=bug fix=
 
* vsm method  
 
* vsm method  
 
:* doesn't clear the pattern before search
 
:* doesn't clear the pattern before search

2014年11月21日 (五) 01:07的版本

MERT-4 Method

lucene method

  • data set
  • jiangkaipeng:
  • different method result
different result in lucene
method Default BM25 LMDirichlet DFR LMJelinekMercer IB
Accary 0.66228 0.66228 0.4091 0.65476 0.65476 0.6666

boost keyword

  • boost the query keyword using IDF
boost keyword in lucene
method Default idf_train idf_train_norm idf_baidu idf_baidu_norm
Accary 0.66228 0.651629 0.57644 0.647869 0.65288
  • TFIDF Formula
  • coord(q,d)*query_boost*query_norm*sum(idf^2 * tf * term_boost * norm(t,d)) [1]
  • add the new keyword value from proMe method

our method

different result in lucene
method lucene vsm_idf(haiguan) VSM_idf(baidu) vsm_idf(tain) vsm_idf(calculate)
Accary 0.6628 0.6228 0.6197 0.5827 0.5426

synonyms method

  • fuzzy match
  • calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
  • lucene
  • lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[2]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.

find

  • 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
  • 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
  • lucene4.6 已经增加了同义词拓展[3]

bug fix

  • vsm method
  • doesn't clear the pattern before search