“Search method”版本间的差异
来自cslt Wiki
(→boost keyword) |
(→boost keyword) |
||
第6行: | 第6行: | ||
= boost keyword = | = boost keyword = | ||
− | [[boost keyword before search]] | + | [[boost keyword before search with ITIDF]] |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
=our method= | =our method= |
2014年11月21日 (五) 01:12的版本
MERT-4 Method
lucene method
boost keyword
boost keyword before search with ITIDF
our method
method | lucene | vsm_idf(haiguan) | VSM_idf(baidu) | vsm_idf(tain) | vsm_idf(calculate) |
---|---|---|---|---|---|
Accary | 0.6628 | 0.6228 | 0.6197 | 0.5827 | 0.5426 |
synonyms method
- fuzzy match
- calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
- lucene
- lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[1]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.
find
- 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
- 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
- lucene4.6 已经增加了同义词拓展[2]
bug fix
- vsm method
- doesn't clear the pattern before search