2014年10月31日 (五) 05:32的最后版本

Dialog system

different result in lucene
method	Default	BM25	LMDirichlet	DFR	LMJelinekMercer	IB
Accary	0.66228	0.66228	0.4091	0.65476	0.65476	0.6666

rewrite the method to select the 50 standard question not same template.
test the boost keyword weight and extract the synonyms word.
check the word segment for template.
min-segment method improve the accuracy.(0.61->0.66)
check the query method for getting lucene information and to rewrite the score method like the idf value.

@@ 第2行： / 第2行： @@
 ==Algorithm==
 ===Spell mistake===
-:* using ngram to get candidate sentence.
+:* using ngram to get candidate sentence.(xingchao)
 ===improve lucene search===
 * lucene similarity method
@@ 第16行： / 第16行： @@
 * our vsm method
-:* our vsm method re-rank(54%),lucene(67%)
+:* our vsm method re-rank(54%),lucene(66.28%)
 * lucene top50(caoli)
@@ 第32行： / 第32行： @@
 :* test the different idf vale from baidu sougou in fuzzymatch.
 :* IDF from train-data performance bad than default IDF,from 0.63->0.69.
 ==knowledge structure==
 * structure the default answer using attributes of the entity.