2015年2月3日 (二) 01:11的最后版本

learning to rank

MERT-4 Method

Optimize the parameter in different data source

lucene method

boost keyword

our method

different result in lucene
method	lucene	vsm_idf(haiguan)	VSM_idf(baidu)	vsm_idf(tain)	vsm_idf(calculate)
Accary	0.6628	0.6228	0.6197	0.5827	0.5426

synonyms method

fuzzy match

calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).

lucene

lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[2]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.

find

采用最细粒度分词(对于标准问题在建立索引时，模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
对输入的问题不应用细粒度分词（细粒度的59%，不用66%）。
lucene4.6 已经增加了同义词拓展[3]

bug fix

vsm method

doesn't clear the pattern before search

@@ 第1行： / 第1行： @@
-==lucene method==
+=learning to rank=
-*data set
+v1.0[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Huilan-learning-to-rank]
-:* jiangkaipeng:
+=MERT-4 Method=
-* different method result
+* [[Optimize the parameter in different data source]]
-{| border="2px"
+=lucene method=
-|+ different result in lucene
+*[[different method in lucene]]
-|-
+*[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Lucene lucene_multi_query]
-! method !!Default  !! BM25 !! LMDirichlet !! DFR !! LMJelinekMercer !! IB
-|-
-! Accary
-| 0.66228 || 0.66228 || 0.4091 || 0.65476 || 0.65476 || 0.6666
-|-
-|}
-* add boost keyword
-{| border="2px"
-|+ boost keyword  in lucene
-|-
-! method !!Default  !! idf_train !! idf_train_norm!! idf_baidu !! idf_baidu_norm
-|-
-! Accary
-| 0.66228 ||  0.651629 ||0.57644|| 0.647869|| 0.65288
-|-
-|}
-* TFIDF Formula
+= boost keyword =
-：* coord(q,d)*query_boost*query_norm*sum(idf^2 * tf * term_boost * norm(t,d))
+[[boost keyword before search with ITIDF]]
-==our method==
+=our method=
 {| border="2px"
 |+ different result in lucene
 |-
-! method !!lucene  !! BM25 !! VSM
+! method !!lucene  !! vsm_idf(haiguan) !! VSM_idf(baidu) !! vsm_idf(tain) !! vsm_idf(calculate)
 |-
 ! Accary
-| 0.6184 || 0.614 || 0.377
+| 0.6628 || 0.6228 || 0.6197 || 0.5827 || 0.5426
 |-
 |}
-==synonyms method==
+=synonyms method=
 * fuzzy match
 :* calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
@@ 第44行： / 第29行： @@
 :*
-==find==
+=find=
 * 采用最细粒度分词(对于标准问题在建立索引时，模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
 * 对输入的问题不应用细粒度分词（细粒度的59%，不用66%）。
 * lucene4.6 已经增加了同义词拓展[http://www.hankcs.com/program/java/lucene-synonymfilterfactory.html]
+=bug fix=
+* vsm method
+:* doesn't clear the pattern before search

“Search method”版本间的差异

2015年2月3日 (二) 01:11的最后版本

目录

learning to rank

MERT-4 Method

lucene method

boost keyword

our method

synonyms method

find

bug fix

导航菜单

个人工具

名字空间

变种

查看

操作

搜索

导航

工具