“Search method”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
boost keyword
Lr讨论 | 贡献
boost keyword
第6行: 第6行:
  
 
= boost keyword =
 
= boost keyword =
[[boost keyword before search]]
 
 
[[boost keyword before search with ITIDF]]
 
[[boost keyword before search with ITIDF]]
  

2014年11月21日 (五) 01:13的版本

MERT-4 Method

lucene method

different method in lucene

boost keyword

boost keyword before search with ITIDF

our method

different result in lucene
method lucene vsm_idf(haiguan) VSM_idf(baidu) vsm_idf(tain) vsm_idf(calculate)
Accary 0.6628 0.6228 0.6197 0.5827 0.5426

synonyms method

  • fuzzy match
  • calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
  • lucene
  • lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[1]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.

find

  • 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
  • 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
  • lucene4.6 已经增加了同义词拓展[2]

bug fix

  • vsm method
  • doesn't clear the pattern before search