“Hulan-2014-11-06”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Multi-Scene Recognition
Lr讨论 | 贡献
Multi-Scene Recognition
 
(相同用户的18个中间修订版本未显示)
第2行: 第2行:
 
==Algorithm==
 
==Algorithm==
 
===Spell mistake===
 
===Spell mistake===
:* retrain the ngram model  
+
:* retrain the ngram model('''caoli''')
 +
:* prepare the test and development set('''caoli''')
 +
 
 +
===improve fuzzy match===
 +
* add Synonyms similarity using MERT-4 method
 +
 
 
===improve lucene search===
 
===improve lucene search===
 
* our vsm method
 
* our vsm method
第16行: 第21行:
 
* lucene top
 
* lucene top
 
:* top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06)
 
:* top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06)
 +
:* test the result of top(100,200,1000) in full qa(lucene+fuzzymatch)('''caoli''')
  
 
* lucene Optimization(liurong)
 
* lucene Optimization(liurong)
:* rewrite the method to select the 50 standard question not same template.
+
:* rewrite the method to select the 50 standard question not same template.(liurong)
:* check the word segment for template.
+
:* check the word segment for template.(liurong)
 
:* boost the query keyword using IDF
 
:* boost the query keyword using IDF
 
{| border="2px"
 
{| border="2px"
第30行: 第36行:
 
|-
 
|-
 
|}
 
|}
:* TFIDF Formula
+
:* using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.('''liurong this month''')
::* coord(q,d)*query_boost*query_norm*sum(idf^2 * tf * term_boost * norm(t,d)) [http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html]
+
  
:* add the new keyword value from proMe method
 
 
===Multi-Scene Recognition===
 
===Multi-Scene Recognition===
* add the triples search to QA engine
+
* add the triples search to QA engine  
:* discuss the detail and give a report.
+
:* discuss the detail and give a report.('''liurong''')
 +
* demo ('''liurong two week''')
  
 
==knowledge structure==
 
==knowledge structure==
* structure the default answer using attributes of the entity.
+
 
 
==Knowledge Management and labeling system==
 
==Knowledge Management and labeling system==
* prepare the interface and function.
+
* continue coding.
  
==plan to do==
+
==Patent==
 +
* the GA method to improve QA .(liurong this month)
 
==plan to discuss==
 
==plan to discuss==
* add the triples search to QA engine
+
* how to add the spell check method to QA engine.

2014年11月6日 (四) 09:07的最后版本

Dialog system

Algorithm

Spell mistake

  • retrain the ngram model(caoli)
  • prepare the test and development set(caoli)

improve fuzzy match

  • add Synonyms similarity using MERT-4 method

improve lucene search

  • our vsm method
different result in lucene
method lucene vsm_idf(haiguan) VSM_idf(baidu) vsm_idf(tain) vsm_idf(calculate)
Accary 0.6628 0.6228 0.6197 0.5827 0.5426
  • lucene top
  • top10(82.95%),top20(86.34),top50(90.23%),top100(94.11%),top200(96.18%),top1000(97.31%),top2000(97.87%),top5000(98.75%),top10000(99.06)
  • test the result of top(100,200,1000) in full qa(lucene+fuzzymatch)(caoli)
  • lucene Optimization(liurong)
  • rewrite the method to select the 50 standard question not same template.(liurong)
  • check the word segment for template.(liurong)
  • boost the query keyword using IDF
boost keyword in lucene
method Default idf_train idf_train_norm idf_baidu idf_baidu_norm
Accary 0.66228 0.651629 0.57644 0.647869 0.65288
  • using MERT-4 method to get good value of multi-feature.like IDF,NER,baidu_weight,keyword etc.(liurong this month)

Multi-Scene Recognition

  • add the triples search to QA engine
  • discuss the detail and give a report.(liurong)
  • demo (liurong two week)

knowledge structure

Knowledge Management and labeling system

  • continue coding.

Patent

  • the GA method to improve QA .(liurong this month)

plan to discuss

  • how to add the spell check method to QA engine.