“QA test”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
140905
Lr讨论 | 贡献
Tool
 
(相同用户的55个中间修订版本未显示)
第1行: 第1行:
==140901==
+
==Laboratory==
===TREC TEST===
+
* [[Opensource: Natural Language Process]]
* Data set: http://cogcomp.cs.illinois.edu/Data/QA/QC/
+
* [[open system]]
* Method: vsm-tfidf/No classifier
+
* classes:9-bigclasses,,48-smallclasses
+
* Result:
+
  
{| border="2px"
+
==Tool==
|+ classification result
+
* SEMPRE (QA toolkit) [http://www-nlp.stanford.edu/software/sempre/]
|-
+
* Z-MERT[http://www.cs.jhu.edu/~ozaidan/zmert/]
! Training Set !! 1000 !! 2000 !! 3000 !! 4000 !! 5500
+
* templatemaker[https://github.com/paulsmith/templatemaker]
|-
+
:*可以从一堆输入样板句子中提取他们的不变部分,然后利用模板可以做匹配检查,成分提取等。对清理Web数据,简单的模式学习超级有用
! bigclass
+
* SPMF: A Java Open-Source Pattern Mining Library
| 0.678 || 0.718 || 0.708 || 0.708 || 0.73
+
:* SPMF is a cross-platform library implemented in Java, specialized for discovering patterns in transaction and sequence databases such as frequent itemsets, association rules and sequential patterns.clustering.
|-
+
! smallclass
+
| 0.58 || 0.606 || 0.606 || 0.616 || 0.628
+
|-
+
|}
+
  
==NanShanData==
+
==Paper==
===Data Set===
+
*[[2014-10-08:qa]]
* big class:教育,社保,就业,医疗,住房,婚育收养,证件办理,资质认定,企业开办,经营纳税,公用事业
+
*[[2014-08-22-qalr]]
* small class:
+
* search in ML
:* 教育:学期教育,小学教育,初中教育,高中教育,职业教育,继续教育,特殊教育,教育救助
+
:* ML for Search and Ads(刘铁岩) NLPCC 2014[http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:L06-ML_for_Search_and_Ads_-_ADL52.pdf]
:* 社保:社保征收,养老保险,医疗保险,工伤保险,失业保险,生育医疗保险,老年人福利,残疾人福利,儿童福利,低保,专项救助,临时救助,优待抚恤,就业安置
+
:* emantic Matching in Search_ADL [http://cslt.riit.tsinghua.edu.cn/mediawiki/index.php/%E6%96%87%E4%BB%B6:L04-Semantic_Matching_in_Search_ADL_Jun_XU_final.pdf]
:* 就业:公务员招考,毕业生就业,人才引进,外地来深建设者就业,失业再就业,退伍军人安置,技能培训,技能鉴定,劳动权益,自主创业
+
* 知识图谱
:* 医疗:医疗机构,门诊住院,药品药店,疾病预防,食品药品安全,卫生监督,医疗保险,医疗救助
+
:* Constructing and Mining Web-scale Knowledge Graphs(KDD 2014)[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/4c/Kdd2014_gabrilovich_bordes_knowledge_graphs.pdf]
:* 住房:租房,售房,货币补贴,买卖商品房,二手房买卖,房屋租赁,服务机构及人员,公积金开户,公积金缴存,公积金贷款
+
:* 垂直知识图谱工具与应用[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/6f/%E5%9E%82%E7%9B%B4%E7%9F%A5%E8%AF%86%E5%9B%BE%E8%B0%B1%E5%B7%A5%E5%85%B7%E4%B8%8E%E5%BA%94%E7%94%A810%E6%9C%8816%E6%97%A5.pdf]
:* 婚育收养:结婚,离婚,撤销婚姻,生育服务,计划生育奖励,计划生育技术服务,收养服务
+
:* 知识图谱:大数据语义链接的基石[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/c6/%E7%9F%A5%E8%AF%86%E5%9B%BE%E8%B0%B1%EF%BC%9A%E5%A4%A7%E6%95%B0%E6%8D%AE%E8%AF%AD%E4%B9%89%E9%93%BE%E6%8E%A5%E7%9A%84%E5%9F%BA%E7%9F%B3-%E6%9D%8E%E6%B6%93%E5%AD%90_%281%29.pdf]
:* 证件办理: 户籍身份,出境入境,驾驶证,教育培训,医疗卫生,司法律师,交通旅游,工程建设,其他类
+
:* Ontology Reasoning for the Semantic Web and Its Application to Knowledge Graph[]
:* 资质认定:教育机构,食品机构,医疗机构,就业服务机构,旅游服务机构,交通运输机构,房地产机构,工程建设机构,其他机构
+
:* 企业开办:名称预核准,前置审批,商事主体登记注册,规则审批,消防证件办理,组织机构代码证申请,外商投资企业设立变更,税务登记
+
:* 经营纳税:企业年报,知识产权,广告业务,信用合同,税务登记,发票业务,申报纳税
+
:* 公用事业:供水,供电,煤气,污水垃圾处理,文体休闲,园林绿化
+
  
===140905===
+
==huilan==
 +
*[[huilian-work]]
 +
*[[qa-Algorithm]]
 +
*[[others]]
  
*Test Set
+
==TEST==
:* label the big class about 1000 query from nanshandata
+
*[[TREC TEST]]
*result
+
*[[NanShanData]]
:* result of big class test is 0.355(395/1112) of title, 0.3444(383/1112) of title+description.
+
 
+
{| border="2px"
+
|+ Acc of query classification
+
|-
+
! Parameters  !! keyword_beta !! keyword_init !! accuracy
+
|-
+
!title
+
| 0 || 0 || 0.355
+
|
+
| 0 || 0 || 0
+
|-
+
!title+description
+
| 0 || 0 || .344
+
|-
+
|}
+
 
+
 
+
{| border="1" cellpadding="5" cellspacing="0"
+
|-
+
! Parameters  !! keyword_beta !! keyword_init !! accuracy
+
|-
+
| rowspan=2| title
+
| b
+
| B
+
|
+
|-
+
| C
+
| D
+
|
+
|-
+
| rowspan=2| title+description
+
| H
+
| I
+
|
+
|-
+
| J
+
| K
+
|
+
|-
+
|}
+

2014年12月29日 (一) 04:02的最后版本

Laboratory

Tool

  • SEMPRE (QA toolkit) [1]
  • Z-MERT[2]
  • templatemaker[3]
  • 可以从一堆输入样板句子中提取他们的不变部分,然后利用模板可以做匹配检查,成分提取等。对清理Web数据,简单的模式学习超级有用
  • SPMF: A Java Open-Source Pattern Mining Library
  • SPMF is a cross-platform library implemented in Java, specialized for discovering patterns in transaction and sequence databases such as frequent itemsets, association rules and sequential patterns.clustering.

Paper

  • ML for Search and Ads(刘铁岩) NLPCC 2014[4]
  • emantic Matching in Search_ADL [5]
  • 知识图谱
  • Constructing and Mining Web-scale Knowledge Graphs(KDD 2014)[6]
  • 垂直知识图谱工具与应用[7]
  • 知识图谱:大数据语义链接的基石[8]
  • Ontology Reasoning for the Semantic Web and Its Application to Knowledge Graph[]

huilan

TEST