“DataBase”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
Lxs讨论 | 贡献
nolexicion wordlist
 
(2位用户的6个中间修订版本未显示)
第1行: 第1行:
==table 1==
+
==lm==
 
{| class="wikitable"
 
{| class="wikitable"
! name !! type!! size !! dir !! description
+
! name !! size !! dir !! description
 
|-
 
|-
|863    ||speech||51h, 76spk || corpora/863 || 863 reading speech database. 16k,16bit
+
|SogouQ.full.train.3gram.gz || 132M || /work/lxs/nlphome/lm/SogouQ-500M || trainData=SougouQ(800M);dict=11w-tecent
 
|-
 
|-
|emotion||speech||22h    ||corpora/emotion||emotional speech for SID, recorded in CSLT. 16k,16bit
+
|SogouT-11w-merge2-1.3gram.gz || 4.1G ||/work/lxs/nlphome/lm/SogouT-140G || trainData=SougouT(140G);dict=11w-tencent
 
|-
 
|-
|callhome ||text||9.02Mb|| corpora/callhome || callhome chinese speech database transcription
+
|SogouT-11w-merge2-2.3gram.gz || 3.9G || /work/lxs/nlphome/lm/SogouT-140G ||
 
|-
 
|-
|tcmsd  ||speech ||34h,60spk ||corpora/tcmsd|| speech database recorded in Tsinghua, 2002. 16k, 16bit
+
|8w8.3gram.tencent.gz || 452M || /work/lxs/nlphome/lm/Tencent ||
 
|-
 
|-
|timit  ||speech ||5.4h  ||corpora/timit || English timit database
+
|musicQuery-ltc.3gram.gz || 28M || /work/lxs/nlphome/lm/TencentQ/musicQuery ||use qa15w-singer-songs.wordlist
 
|-
 
|-
|gigaword||text || 668MW ||corpora/chinese_gigaword || Gigaword text for Chinese
+
|TencentQ.3gram.gz || 1.4G || /work/lxs/nlphome/lm/TencentQ/qa15w ||use qa15w.lexicion
 
|-
 
|-
|ulgur ||speech&text||xju: 141h (tr. 136h) xjnu: 8.54h ||corpora/ulgur ||ulgur speech and text data
+
|mix-corp1-corp2.3gram.gz || 1.3G ||/work/lxs/nlphome/lm/TencentQ/qa15w-nosinger-song||use qa15w-nosinger-song.wordlist
 
|-
 
|-
|tvboard||speech ||-|| corpora/tvboard ||tv and broadcast no-transcribed archieve
+
|mix-corp1_0.5-corp2_0.5.3gram.gz||1.4G||/work/lxs/nlphome/lm/TencentQ/qa15w-singer-song||use qa15w-singer-song.wordlist
 
|-
 
|-
|weibo || text||10Gb|| corpora/weibo || English weibo text data
+
|11w_merge6_kn.3gram.gz||4.3G||/work/lxs/nlphome/lm/TencentQA-100G|| trainData=qa(100G),dict=11w-tencent
 
|-
 
|-
|qa || text||124Gb|| corpora/qa || QA text data
+
|8w8_new_merge6_kn.3gram0.gz||4.5G||/work/lxs/nlphome/lm/TencentQA-100G||trainData=qa(100G),dict=8w8-tencent
 
|-
 
|-
|pvad||speech||5.4h||corpora/puqiang/VAD || speech data for VAD, from Pachira
+
|Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-5.3gram.gz||1.4M||/work/lxs/nlphome/lm/jietong||
 
|-
 
|-
|ppoi||speech||208h||corpora/puqiang/poi || 8k telephone speech in poi from Pachira
+
|Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-9.5gram.gz||389M||/work/lxs/nlphome/lm/jietong||
 +
|}
 +
 
 +
==lexicion wordlist==
 +
{| class="wikitable"
 +
! name !! size !! dir !! description
 
|-
 
|-
|T400||speech||400h||corpora/tencent ||speech data from Tencent
+
|singer.lexicion||2060 ||/work/lxs/nlphome/dict/lex-wordlist/music/lr ||
 
|-
 
|-
|dt700 ||speech||700h||corpora/tencent/dt700 ||700 hour reading speech data
+
|singer.low.lexicion||2060||/work/lxs/nlphome/dict/lex-wordlist/music/lr||
 
|-
 
|-
|legend-vod || speech ||-||corpora/legend-vod ||some test speech and vod
+
|singer.pinyin||2104||/work/lxs/nlphome/dict/lex-wordlist/music/lr||
 
|-
 
|-
|mobil-eng || speech ||26h||corpora/lenvxx/data/wav/mobil-eng ||english speech of chinese people
+
|song.lexicion||4639||/work/lxs/nlphome/dict/lex-wordlist/music/lr||
 
|-
 
|-
|legend-online || speech ||54h||corpora/lenvxx/data/wav/real-online || online speech data
+
|song.low.lexicion||4639||/work/lxs/nlphome/dict/lex-wordlist/music/lr||
 
|-
 
|-
|legend-wakeup || speech ||1h||corpora/lenvxx/data/wav/wake-up || wake up test speech
+
|song.pinyin||4644||/work/lxs/nlphome/dict/lex-wordlist/music/lr||
 
|-
 
|-
|legend-reading || speech ||21h||corpora/lenvxx/data/wav/haitian || reading speech
+
|qa15w-ch-sinovoice.lexicion||92469||/work/lxs/nlphome/dict/lex-wordlist/qa-check||
 
|-
 
|-
|legend-sel-for-test || speech ||21h||corpora/lenvxx/data/wav/sel_for_test || reading speech
+
|qa15w-ch.pinyin||92469||/work/lxs/nlphome/dict/lex-wordlist/qa-check||
 
|-
 
|-
|POI-lexicon || lexicon ||-||corpora/lenvxx/data/lexicon || lexicon for POI applications
+
|qa15w.lexicion||158404||/work/lxs/nlphome/dict/lex-wordlist/qa-check||
 
|-
 
|-
|NLPR || lexicon,categories ||-||corpora/lenvxx/data/text/nlpcorpus || resources of NLP tasks
+
|11w.lexicion||122172||/work/lxs/nlphome/dict/lex-wordlist/tencent||
 
|-
 
|-
|serviceT || text ||-||corpora/lenvxx/data/text/service_text || text recorded from online service
+
|8w8.lexicion||90795||/work/lxs/nlphome/dict/lex-wordlist/tencent||
 +
|}
 +
 
 +
==nolexicion wordlist==
 +
{| class="wikitable"
 +
! name !! size !! dir !! description
 
|-
 
|-
|sougouText || text ||-||corpora/sogou || sogouQ and sogouT
+
|singer.wordlist||2060||/work/lxs/nlphome/dict/nolex-wordlist/music/lr||
 
|-
 
|-
|wsj  || speech ||100h||corpora/wsj ||wall-street journal speech db
+
|song.wordlist||4639||/work/lxs/nlphome/dict/nolex-wordlist/music/lr||
 
|-
 
|-
|hownet || lexicon || - || corpora/hownet || HowNet relation db
+
|album.txt||11736||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
|casia || speech ||4000 u || corpora/tts/casia || male TTS speech
+
|area.txt||4||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
|huilan-tts || speech ||2000 u || corpora/tts/huilan || male/female TTS speech from Huilan
+
|chart.txt||28||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
|tts-novel || speech ||20h  || corpora/tts/novel || speech data download from internet for tts
+
|drama.txt||517||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
|Sinovoice-tel || speech || 470h+300h || corpora/sinovoice/tel || telephone speech data from Sinovoice
+
|language.txt||35||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
|Sinovoice-16k || speech || 6000h || corpora/sinovoice/16k || mobile 16k speech data from Sinovoice
+
|singer.txt||4456||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
|}
+
|stopwords.txt||894||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
+
==table 2==
+
{| class="wikitable"
+
! name !! type!! size !! dir !! description
+
 
|-
 
|-
|863    ||speech||51h, 76spk || corpora/863 || 863 reading speech database. 16k,16bit
+
|song.txt||26153||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 
|-
 
|-
 +
|style.txt||562||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 +
|-
 +
|type.txt||3||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||
 +
|-
 +
|entity.txt||36198||/work/lxs/nlphome/dict/nolex-wordlist/music/ltc||merge album area chart drama language singer song stopwords style type
 +
|-
 +
|qa15w.wordlist||147996||/work/lxs/nlphome/dict/nolex-wordlist/qa-check||
 +
|-
 +
|11w.wordlist||111895||/work/lxs/nlphome/dict/nolex-wordlist/tencent||
 +
|-
 +
|8w8.wordlist||88055||/work/lxs/nlphome/dict/nolex-wordlist/tencent||
 +
|-
 +
|scws20w-utf8.wordlist||284646||/work/lxs/nlphome/dict/nolex-wordlist||
 
|}
 
|}
 +
 +
==lenvxx==
 +
===path:/nfs/corpus/data/corpora/lenvxx===
 +
===description:I settle the data in /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus===
 +
=====(in this directory,it include 4 subdirectory:ChinaDivision , dict , dict4VOD , document Resource)=====
 +
;1.Directory:/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict
 +
:;1.include directory :sogou-dict
 +
:::*城市信息:include many provinces' data about the cities' names and places' names in the province,and some localisms,and some cities' information about bus station and the streets' name
 +
:::*电子游戏
 +
::::*单机游戏:include the console games' name from 2001 to 2011,and some game's wordlist.
 +
::::*网游:include the online games' name from 2008 to 2011 and some game's wordlist.
 +
:::*工程与应用科学:include the specialized vocabulary wordlists in project field.
 +
::::*计算机:include the specialized vocabulary wordlists in computer field,and Alibaba's product vocabulary in many fields.
 +
:::*农林鱼畜:include the wordlist about livestock and agriculture.
 +
:::*人文科学
 +
::::*文学:include the wordlist about ancient Chinese literature and masterwork,and some novels' wordlist.
 +
::::*语言:include the wordlists about idiom and Folklore,Network buzzwords.
 +
::::*哲学:include the wordlists about philosophy.for instance,Hegel,Marxism.
 +
::::*宗教:include the wordlists about Taoism,Buddhism,Islam
 +
::::*历史:include the wordlists about the history about Chinese,and Japanese's warring states period,diplomacy.
 +
::::*其他:include the wordlist about the ancient Chinese numerology.
 +
:::*社会科学
 +
::::*法律:include the wordlists about law.
 +
::::*教育:include the wordlists about some universities' architecture,and some wordlist about textbook,list of Chinese univercity and America famous univercity.
 +
::::*金融:include the wordlists about wordlist about financial.
 +
::::*军事:include the wordlists about military.
 +
::::*政治:include the wordlists about Party and government offices,political,and ancient China Official institutions     
 +
::::*其他:include the wordlists about public relations,ethics,anthropology
 +
:::*生活:include the wordlists about many fields in our lief.
 +
:::*医学:include the wordlists about medical science.
 +
:::*艺术
 +
::::*书法篆刻:include the wordlists about sculpture and calligraphy.
 +
::::*舞蹈:include the wordlists about dance and Gymnastics Rhythmic.
 +
::::*戏剧:include the wordlists about drama.
 +
::::*音乐:include the wordlists about music major in Chinese and the west.
 +
::::*其他:include the wordlists of tea,sculpture,er ren zhuan,world heritage,artist.
 +
:::*娱乐
 +
::::*电影电视:include the wordlists about science fiction film.
 +
::::*动漫:include the wordlists about some cartoons.
 +
::::*流行音乐:include the wordlists about a novel of A Song of Ice and Fire,fashionable word or phrase.
 +
::::*明星:include the wordlists about some famous person.
 +
::::*汽车:include the wordlists about car field.
 +
::::*收藏:include the wordlists about advertisement.
 +
::::*时尚品牌:the directory is empty.
 +
:::*运动休闲
 +
::::*F1赛车:the directory is empty.
 +
::::*奥运:include the wordlists of Olympic.
 +
::::*垂钓:include the wordlists of fishing.
 +
::::*轮滑:include a wordlist of roller skating.
 +
::::*棋牌:include the wordlists about mahjong,go,chinese chess,san guo sha.
 +
::::*气功:include the wordlists about qigong.
 +
::::*球类:include the wordlists about football,basketball,ping-bang ball,golf,badminton.
 +
::::*杀人游戏:the directory is empty.
 +
::::*跆拳道:include the wordlists of taekwondo.
 +
::::*太极拳:include the wordlists of ba gua,tai ji quan.
 +
::::*武术:include the wordlists of wu shu.
 +
::::*自行车:the directory is empty.
 +
::::*其他:include the wordlists about fencing,judo,wrestling,yoga.
 +
:::*自然科学
 +
::::*化学:include the wordlists of chemistry.
 +
::::*生物:include the wordlists of biology.
 +
::::*数学:include the wordlists of math.
 +
::::*天文学:include the wordlists of astronomy.
 +
::::*物理:include the wordlists of physics.
 +
::::*其他:include the wordlists of stone.
 +
:;2.include directory :movie(include many wordlists about movie major)
 +
:::*电影:include the movie wordlists of inland,Hongkong and Taiwan,Europe and America,Asian.
 +
:::*明星:include the movie star wordlists of inland,Hongkong and Taiwan,Europe and America,Asian.
 +
:;3.include directory :movie-dict(include the wordlists of actor,director,moviename,roles,style)
 +
:;4.include directory :name(include the wordlists of famous person in inland,Hongkong and Taiwan,Europe and America,Asian.)
 +
:;5.include directory :NER(include the wordlists of person name in English,Japan,Korea,Russia)
 +
:;6.include directory :Pinyin(include a wordlists of duo ying zhi)
 +
:;7.include directory :VOD
 +
:::*电视剧:include a wordlist of teleplay.
 +
:::*电影:include a wordlist of movie.
 +
:::*微电影:include a wordlist of micro film.
 +
:::*音乐:include the wordlists of famous songs in inland,Hongkong and Taiwan,Europe and America,Japan and South Korea
 +
:::*综艺:include a wordlists of show.
 +
:;8.include directory :领域术语(include the wordlists about computer,economy,travel,sports,medicine)
 +
:;9.include directory :语言学词库
 +
:::*基础名词:it include person,abstract noun,nature,person making things,fashion noun.
 +
:::*语言学词汇类别:it include all grammar vocabulary.
 +
;2.Directory:/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict4VOD
 +
:the directory include the wordlists of movie distribution company,film award,filmfest,actors'name,chinese and english comparison table.
 +
;3.Directory:/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/ChinaDivision
 +
:the directory include 4 wordlists,which divide in 4 level(province name,city name,region name,street name)

2014年2月26日 (三) 06:24的最后版本

lm

name size dir description
SogouQ.full.train.3gram.gz 132M /work/lxs/nlphome/lm/SogouQ-500M trainData=SougouQ(800M);dict=11w-tecent
SogouT-11w-merge2-1.3gram.gz 4.1G /work/lxs/nlphome/lm/SogouT-140G trainData=SougouT(140G);dict=11w-tencent
SogouT-11w-merge2-2.3gram.gz 3.9G /work/lxs/nlphome/lm/SogouT-140G
8w8.3gram.tencent.gz 452M /work/lxs/nlphome/lm/Tencent
musicQuery-ltc.3gram.gz 28M /work/lxs/nlphome/lm/TencentQ/musicQuery use qa15w-singer-songs.wordlist
TencentQ.3gram.gz 1.4G /work/lxs/nlphome/lm/TencentQ/qa15w use qa15w.lexicion
mix-corp1-corp2.3gram.gz 1.3G /work/lxs/nlphome/lm/TencentQ/qa15w-nosinger-song use qa15w-nosinger-song.wordlist
mix-corp1_0.5-corp2_0.5.3gram.gz 1.4G /work/lxs/nlphome/lm/TencentQ/qa15w-singer-song use qa15w-singer-song.wordlist
11w_merge6_kn.3gram.gz 4.3G /work/lxs/nlphome/lm/TencentQA-100G trainData=qa(100G),dict=11w-tencent
8w8_new_merge6_kn.3gram0.gz 4.5G /work/lxs/nlphome/lm/TencentQA-100G trainData=qa(100G),dict=8w8-tencent
Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-5.3gram.gz 1.4M /work/lxs/nlphome/lm/jietong
Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-9.5gram.gz 389M /work/lxs/nlphome/lm/jietong

lexicion wordlist

name size dir description
singer.lexicion 2060 /work/lxs/nlphome/dict/lex-wordlist/music/lr
singer.low.lexicion 2060 /work/lxs/nlphome/dict/lex-wordlist/music/lr
singer.pinyin 2104 /work/lxs/nlphome/dict/lex-wordlist/music/lr
song.lexicion 4639 /work/lxs/nlphome/dict/lex-wordlist/music/lr
song.low.lexicion 4639 /work/lxs/nlphome/dict/lex-wordlist/music/lr
song.pinyin 4644 /work/lxs/nlphome/dict/lex-wordlist/music/lr
qa15w-ch-sinovoice.lexicion 92469 /work/lxs/nlphome/dict/lex-wordlist/qa-check
qa15w-ch.pinyin 92469 /work/lxs/nlphome/dict/lex-wordlist/qa-check
qa15w.lexicion 158404 /work/lxs/nlphome/dict/lex-wordlist/qa-check
11w.lexicion 122172 /work/lxs/nlphome/dict/lex-wordlist/tencent
8w8.lexicion 90795 /work/lxs/nlphome/dict/lex-wordlist/tencent

nolexicion wordlist

name size dir description
singer.wordlist 2060 /work/lxs/nlphome/dict/nolex-wordlist/music/lr
song.wordlist 4639 /work/lxs/nlphome/dict/nolex-wordlist/music/lr
album.txt 11736 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
area.txt 4 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
chart.txt 28 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
drama.txt 517 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
language.txt 35 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
singer.txt 4456 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
stopwords.txt 894 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
song.txt 26153 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
style.txt 562 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
type.txt 3 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
entity.txt 36198 /work/lxs/nlphome/dict/nolex-wordlist/music/ltc merge album area chart drama language singer song stopwords style type
qa15w.wordlist 147996 /work/lxs/nlphome/dict/nolex-wordlist/qa-check
11w.wordlist 111895 /work/lxs/nlphome/dict/nolex-wordlist/tencent
8w8.wordlist 88055 /work/lxs/nlphome/dict/nolex-wordlist/tencent
scws20w-utf8.wordlist 284646 /work/lxs/nlphome/dict/nolex-wordlist

lenvxx

path:/nfs/corpus/data/corpora/lenvxx

description:I settle the data in /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus

(in this directory,it include 4 subdirectory:ChinaDivision , dict , dict4VOD , document Resource)
1.Directory
/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict
1.include directory 
sogou-dict
  • 城市信息:include many provinces' data about the cities' names and places' names in the province,and some localisms,and some cities' information about bus station and the streets' name
  • 电子游戏
  • 单机游戏:include the console games' name from 2001 to 2011,and some game's wordlist.
  • 网游:include the online games' name from 2008 to 2011 and some game's wordlist.
  • 工程与应用科学:include the specialized vocabulary wordlists in project field.
  • 计算机:include the specialized vocabulary wordlists in computer field,and Alibaba's product vocabulary in many fields.
  • 农林鱼畜:include the wordlist about livestock and agriculture.
  • 人文科学
  • 文学:include the wordlist about ancient Chinese literature and masterwork,and some novels' wordlist.
  • 语言:include the wordlists about idiom and Folklore,Network buzzwords.
  • 哲学:include the wordlists about philosophy.for instance,Hegel,Marxism.
  • 宗教:include the wordlists about Taoism,Buddhism,Islam
  • 历史:include the wordlists about the history about Chinese,and Japanese's warring states period,diplomacy.
  • 其他:include the wordlist about the ancient Chinese numerology.
  • 社会科学
  • 法律:include the wordlists about law.
  • 教育:include the wordlists about some universities' architecture,and some wordlist about textbook,list of Chinese univercity and America famous univercity.
  • 金融:include the wordlists about wordlist about financial.
  • 军事:include the wordlists about military.
  • 政治:include the wordlists about Party and government offices,political,and ancient China Official institutions
  • 其他:include the wordlists about public relations,ethics,anthropology
  • 生活:include the wordlists about many fields in our lief.
  • 医学:include the wordlists about medical science.
  • 艺术
  • 书法篆刻:include the wordlists about sculpture and calligraphy.
  • 舞蹈:include the wordlists about dance and Gymnastics Rhythmic.
  • 戏剧:include the wordlists about drama.
  • 音乐:include the wordlists about music major in Chinese and the west.
  • 其他:include the wordlists of tea,sculpture,er ren zhuan,world heritage,artist.
  • 娱乐
  • 电影电视:include the wordlists about science fiction film.
  • 动漫:include the wordlists about some cartoons.
  • 流行音乐:include the wordlists about a novel of A Song of Ice and Fire,fashionable word or phrase.
  • 明星:include the wordlists about some famous person.
  • 汽车:include the wordlists about car field.
  • 收藏:include the wordlists about advertisement.
  • 时尚品牌:the directory is empty.
  • 运动休闲
  • F1赛车:the directory is empty.
  • 奥运:include the wordlists of Olympic.
  • 垂钓:include the wordlists of fishing.
  • 轮滑:include a wordlist of roller skating.
  • 棋牌:include the wordlists about mahjong,go,chinese chess,san guo sha.
  • 气功:include the wordlists about qigong.
  • 球类:include the wordlists about football,basketball,ping-bang ball,golf,badminton.
  • 杀人游戏:the directory is empty.
  • 跆拳道:include the wordlists of taekwondo.
  • 太极拳:include the wordlists of ba gua,tai ji quan.
  • 武术:include the wordlists of wu shu.
  • 自行车:the directory is empty.
  • 其他:include the wordlists about fencing,judo,wrestling,yoga.
  • 自然科学
  • 化学:include the wordlists of chemistry.
  • 生物:include the wordlists of biology.
  • 数学:include the wordlists of math.
  • 天文学:include the wordlists of astronomy.
  • 物理:include the wordlists of physics.
  • 其他:include the wordlists of stone.
2.include directory 
movie(include many wordlists about movie major)
  • 电影:include the movie wordlists of inland,Hongkong and Taiwan,Europe and America,Asian.
  • 明星:include the movie star wordlists of inland,Hongkong and Taiwan,Europe and America,Asian.
3.include directory 
movie-dict(include the wordlists of actor,director,moviename,roles,style)
4.include directory 
name(include the wordlists of famous person in inland,Hongkong and Taiwan,Europe and America,Asian.)
5.include directory 
NER(include the wordlists of person name in English,Japan,Korea,Russia)
6.include directory 
Pinyin(include a wordlists of duo ying zhi)
7.include directory 
VOD
  • 电视剧:include a wordlist of teleplay.
  • 电影:include a wordlist of movie.
  • 微电影:include a wordlist of micro film.
  • 音乐:include the wordlists of famous songs in inland,Hongkong and Taiwan,Europe and America,Japan and South Korea
  • 综艺:include a wordlists of show.
8.include directory 
领域术语(include the wordlists about computer,economy,travel,sports,medicine)
9.include directory 
语言学词库
  • 基础名词:it include person,abstract noun,nature,person making things,fashion noun.
  • 语言学词汇类别:it include all grammar vocabulary.
2.Directory
/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict4VOD
the directory include the wordlists of movie distribution company,film award,filmfest,actors'name,chinese and english comparison table.
3.Directory
/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/ChinaDivision
the directory include 4 wordlists,which divide in 4 level(province name,city name,region name,street name)