“Hulan-2014-10-31”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Lr讨论 | 贡献
improve lucene search
 
(相同用户的一个中间修订版本未显示)
第2行: 第2行:
 
==Algorithm==
 
==Algorithm==
 
===Spell mistake===
 
===Spell mistake===
:* using ngram to get candidate sentence.
+
:* using ngram to get candidate sentence.(xingchao)
 
===improve lucene search===
 
===improve lucene search===
 
* lucene similarity method
 
* lucene similarity method
第16行: 第16行:
  
 
* our vsm method
 
* our vsm method
:* our vsm method re-rank(54%),lucene(67%)
+
:* our vsm method re-rank(54%),lucene(66.28%)
  
 
* lucene top50(caoli)
 
* lucene top50(caoli)
第32行: 第32行:
 
:* test the different idf vale from baidu sougou in fuzzymatch.
 
:* test the different idf vale from baidu sougou in fuzzymatch.
 
:* IDF from train-data performance bad than default IDF,from 0.63->0.69.
 
:* IDF from train-data performance bad than default IDF,from 0.63->0.69.
 +
 
==knowledge structure==
 
==knowledge structure==
 
* structure the default answer using attributes of the entity.
 
* structure the default answer using attributes of the entity.

2014年10月31日 (五) 05:32的最后版本

Dialog system

Algorithm

Spell mistake

  • using ngram to get candidate sentence.(xingchao)

improve lucene search

  • lucene similarity method
different result in lucene
method Default BM25 LMDirichlet DFR LMJelinekMercer IB
Accary 0.66228 0.66228 0.4091 0.65476 0.65476 0.6666
  • our vsm method
  • our vsm method re-rank(54%),lucene(66.28%)
  • lucene top50(caoli)
  • top10(82.95%),top20(86.34),top50(90.22%)
  • need to check the other 10% error
  • lucene Optimization(liurong)
  • rewrite the method to select the 50 standard question not same template.
  • test the boost keyword weight and extract the synonyms word.
  • check the word segment for template.
  • min-segment method improve the accuracy.(0.61->0.66)
  • check the query method for getting lucene information and to rewrite the score method like the idf value.
  • IDF(caoli)
  • test the different idf vale from baidu sougou in fuzzymatch.
  • IDF from train-data performance bad than default IDF,from 0.63->0.69.

knowledge structure

  • structure the default answer using attributes of the entity.

Knowledge Management and labeling system

  • prepare the interface and function.

plan to discuss

  • add the triples search to QA engine