2015年2月3日 (二) 01:11的最后版本

learning to rank

different result in lucene
method	lucene	vsm_idf(haiguan)	VSM_idf(baidu)	vsm_idf(tain)	vsm_idf(calculate)
Accary	0.6628	0.6228	0.6197	0.5827	0.5426

calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).

lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[2]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.

@@ 第1行： / 第1行： @@
+=learning to rank=
+v1.0[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Huilan-learning-to-rank]
 =MERT-4 Method=
 * [[Optimize the parameter in different data source]]
 =lucene method=
-*data set
+*[[different method in lucene]]
-:* jiangkaipeng:
+*[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Lucene lucene_multi_query]
-* different method result
-{| border="2px"
-|+ different result in lucene
-|-
-! method !!Default  !! BM25 !! LMDirichlet !! DFR !! LMJelinekMercer !! IB
-|-
-! Accary
-| 0.66228 || 0.66228 || 0.4091 || 0.65476 || 0.65476 || 0.6666
-|-
-|}
 = boost keyword =
-* boost the query keyword using IDF
+[[boost keyword before search with ITIDF]]
-{| border="2px"
-|+ boost keyword  in lucene
-|-
-! method !!Default  !! idf_train !! idf_train_norm!! idf_baidu !! idf_baidu_norm
-|-
-! Accary
-| 0.66228 ||  0.651629 ||0.57644|| 0.647869|| 0.65288
-|-
-|}
-* TFIDF Formula
-:* coord(q,d)*query_boost*query_norm*sum(idf^2 * tf * term_boost * norm(t,d)) [http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html]
-* add the new keyword value from proMe method
 =our method=