“Search method”版本间的差异
来自cslt Wiki
(→lucene method) |
(→lucene method) |
||
(相同用户的29个中间修订版本未显示) | |||
第1行: | 第1行: | ||
− | == | + | =learning to rank= |
− | *data | + | v1.0[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Huilan-learning-to-rank] |
− | + | =MERT-4 Method= | |
− | + | * [[Optimize the parameter in different data source]] | |
+ | =lucene method= | ||
+ | *[[different method in lucene]] | ||
+ | *[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/Lucene lucene_multi_query] | ||
+ | |||
+ | = boost keyword = | ||
+ | [[boost keyword before search with ITIDF]] | ||
+ | |||
+ | =our method= | ||
{| border="2px" | {| border="2px" | ||
|+ different result in lucene | |+ different result in lucene | ||
|- | |- | ||
− | ! method !! | + | ! method !!lucene !! vsm_idf(haiguan) !! VSM_idf(baidu) !! vsm_idf(tain) !! vsm_idf(calculate) |
|- | |- | ||
! Accary | ! Accary | ||
− | | 0. | + | | 0.6628 || 0.6228 || 0.6197 || 0.5827 || 0.5426 |
|- | |- | ||
|} | |} | ||
− | = | + | =synonyms method= |
− | + | * fuzzy match | |
− | + | :* calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet). | |
− | + | * lucene | |
− | + | :* lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/synonym/package-summary.html#package_description]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query. | |
− | + | :* | |
− | + | ||
− | + | =find= | |
− | + | * 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时. | |
− | + | * 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。 | |
+ | * lucene4.6 已经增加了同义词拓展[http://www.hankcs.com/program/java/lucene-synonymfilterfactory.html] | ||
+ | =bug fix= | ||
+ | * vsm method | ||
+ | :* doesn't clear the pattern before search |
2015年2月3日 (二) 01:11的最后版本
目录
learning to rank
v1.0[1]
MERT-4 Method
lucene method
boost keyword
boost keyword before search with ITIDF
our method
method | lucene | vsm_idf(haiguan) | VSM_idf(baidu) | vsm_idf(tain) | vsm_idf(calculate) |
---|---|---|---|---|---|
Accary | 0.6628 | 0.6228 | 0.6197 | 0.5827 | 0.5426 |
synonyms method
- fuzzy match
- calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
- lucene
- lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[2]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.
find
- 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
- 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
- lucene4.6 已经增加了同义词拓展[3]
bug fix
- vsm method
- doesn't clear the pattern before search