14-10-19 Bin Yuan
来自cslt Wiki
Accomplished this week
- build HCLG using wsj corpus for Liu Rong
- learn HIT's LTP tools for segment, pos and ner
- use LTP to process the BaiduHi and BaiduZhidao corpus(total 365G), program is running(total time cost about 3 days, 20 tasks on JieTong grid)
- make a report about word2vec code
Planned for next week
- the address-tag list is very large, find appropriate way to reduce the address-tag list size
- generate high-frequency address-tag list
- generate tagged corpus