Search method
来自cslt Wiki
2014年11月5日 (三) 15:05
Lr(讨论 | 贡献)的版本
lucene method
-
different result in lucene
method |
Default |
BM25 |
LMDirichlet |
DFR |
LMJelinekMercer |
IB
|
Accary
|
0.66228 |
0.66228 |
0.4091 |
0.65476 |
0.65476 |
0.6666
|
boost keyword
- boost the query keyword using IDF
boost keyword in lucene
method |
Default |
idf_train |
idf_train_norm |
idf_baidu |
idf_baidu_norm
|
Accary
|
0.66228 |
0.651629 |
0.57644 |
0.647869 |
0.65288
|
- coord(q,d)*query_boost*query_norm*sum(idf^2 * tf * term_boost * norm(t,d)) [1]
- add the new keyword value from proMe method
our method
different result in lucene
method |
lucene |
vsm_idf(haiguan) |
VSM_idf(baidu) |
vsm_idf(tain) |
vsm_idf(calculate)
|
Accary
|
0.6628 |
0.6228 |
0.6197 |
0.5827 |
0.5426
|
synonyms method
- calculate the similarity value = 1/(5-5*av_value).where av_value = average(word2vec+Synonyms forest+hownet).
- lucene4.6 already added synonyms method (org.apache.lucene.analysis.synonym[2]) like :(a -> x) (a b -> y) (b c d -> z) or extend the query.
-
find
- 采用最细粒度分词(对于标准问题在建立索引时,模板不用),可以提高正确率。61=>66.对于标准问题建索引时.
- 对输入的问题不应用细粒度分词(细粒度的59%,不用66%)。
- lucene4.6 已经增加了同义词拓展[3]