“Public tools”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(相同用户的3个中间修订版本未显示)
第1行: 第1行:
===JCLTS===
 
 
JCLTS is a toolkit which uses joint-multigram model plus CRF to generate predictions of a stream given another stream. The JMM is responsible for a simple model to obtain raw alignment of the two streams, and the CRF is used to train a powerful model based on the alignment.
 
ref to:Dong Wang, Simon King, "Letter-to-Sound Pronunciation Prediction using Conditional Random Field", IEEE Signal Processing Letters, vol 18, no.2, February 2011 , pp 122-125.
 
 
[http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/tools/zip/lts.v1.0.tar.gz download]
 
 
 
===GFCC===
 
===GFCC===
 
Time domain gamma tone cepstral coefficient (GFCC) provided by Jun Qi, EE dep. Tsinghua Univ., China.
 
Time domain gamma tone cepstral coefficient (GFCC) provided by Jun Qi, EE dep. Tsinghua Univ., China.
 
ref to:"Auditory feature based on Gammatone filters for robust speech recognition", ISCAS 2013.
 
ref to:"Auditory feature based on Gammatone filters for robust speech recognition", ISCAS 2013.
  
[http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/tools/zip/gfcc.v0.2.tgz download]
+
[http://wangd.cslt.org/public/tools/zip/gfcc.v0.2.tgz download]
  
===FSTi===
+
===Online CNSC===
 +
 
 +
This package contains my matlab code for online learning approach for convolutive non-negative sparse coding (OLCNSC). Refer to the following paper on interspeech 2011:
 +
Dong Wang, Nicholas Evans, "Online Pattern Learning for Convolutive Non-negative Sparse Coding"
 +
 
 +
[http://wangd.cslt.org/public/tools/zip/olcnsc.tar.gz download]
 +
 
 +
===Micro-FST Composition Tool===
 +
 
 +
Usually we need embed a small FST to a large FST, to implement a nested LM. This has been used in the pair-word approach for unfrequent word enhancement. Check the paper
 +
of interspeech 2015 [http://wangd.cslt.org/public/pdf/wpair.pdf here]. This package contains a tool for the FST embedding.
 +
[http://pubtools.cslt.org/fstemb/tag-based_lm.tar.gz download]
 +
 
 +
===vMF-SNE===
 +
 
 +
vMF-SNE is a tool for spherical data embedding and visualization. The tool is based on t-SNE  (http://lvdmaaten.github.io/tsne/) and adapted to deal with spherical data (l-2 normalized).
 +
 
 +
[http://pubtools.cslt.org/vmfsne/zip/vmfsne.tgz download]
 +
 
 +
Refer to the following TRP for more details: [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/0/0b/Cslt-trp-template-vmfsne.pdf VMF-SNE: Embedding for Spherical Data. Mian Wang, Dong Wang]
 +
 
 +
 
 +
===JCLTS (deprecated)===
 +
 
 +
JCLTS is a toolkit which uses joint-multigram model plus CRF to generate predictions of a stream given another stream. The JMM is responsible for a simple model to obtain raw alignment of the two streams, and the CRF is used to train a powerful model based on the alignment.
 +
ref to:Dong Wang, Simon King, "Letter-to-Sound Pronunciation Prediction using Conditional Random Field", IEEE Signal Processing Letters, vol 18, no.2, February 2011 , pp 122-125.
 +
 
 +
[http://wangd.cslt.org/public/tools/zip/lts.v1.0.tar.gz download]
 +
 
 +
===FSTi(deprecated)===
  
 
This FSTi tookit contains a set of indexing tools developed in CSLT, Tsinghua University. The main purpose of FSTi is to provide a quick and easy way to construct an entire STD system when combining with some standard tools, including:
 
This FSTi tookit contains a set of indexing tools developed in CSLT, Tsinghua University. The main purpose of FSTi is to provide a quick and easy way to construct an entire STD system when combining with some standard tools, including:
第30行: 第52行:
 
[1] Chao Liu, Dong Wang, "N-gram FST indexing for spoken term detection", Interspeech 2012.
 
[1] Chao Liu, Dong Wang, "N-gram FST indexing for spoken term detection", Interspeech 2012.
  
[http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/tools/zip/FSTi.tar.gz download]
+
[http://wangd.cslt.org/public/tools/zip/FSTi.tar.gz download]
  
===HCNSC===
+
===HCNSC(deprecated)===
  
 
We publish the heterogeneous CNSC code plus an example task on speech separation. For details please refer to
 
We publish the heterogeneous CNSC code plus an example task on speech separation. For details please refer to
第43行: 第65行:
 
Any comments, questions, bugs.. are particularly welcome.  
 
Any comments, questions, bugs.. are particularly welcome.  
  
[http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/tools/zip/hcnsc.tar.gz download]
+
[http://wangd.cslt.org/public/tools/zip/hcnsc.tar.gz download]
  
===Online CNSC===
 
  
This package contains my matlab code for online learning approach for convolutive non-negative sparse coding (OLCNSC). Refer to the following paper on interspeech 2011:
+
===Crawler based on keywords (deprecated)===
Dong Wang, Nicholas Evans, "Online Pattern Learning for Convolutive Non-negative Sparse Coding"
+
 
+
[http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/tools/zip/olcnsc.tar.gz download]
+
 
+
===Crawler based on keywords ===
+
  
 
This is a crawler which can get the Top-K relevant webpages' contents returned by Search Engine e.g. Bing with keywords e.g. yahoo, 百度(Chinese writing of Baidu).  
 
This is a crawler which can get the Top-K relevant webpages' contents returned by Search Engine e.g. Bing with keywords e.g. yahoo, 百度(Chinese writing of Baidu).  
第60行: 第76行:
 
[http://pan.baidu.com/s/1bnrmWhx download]
 
[http://pan.baidu.com/s/1bnrmWhx download]
  
===Sina weibo crawler===
+
===Sina weibo crawler(deprecated)===
  
 
This is a crawler to get sina weibo's information including body, comments, number of forwards and etc. This code is implemented by Python. This crawler will get weibo 340M/day if you conduct uninterrupted running for 24 hours.
 
This is a crawler to get sina weibo's information including body, comments, number of forwards and etc. This code is implemented by Python. This crawler will get weibo 340M/day if you conduct uninterrupted running for 24 hours.
第66行: 第82行:
  
 
[http://pan.baidu.com/s/1qW9LqH2 download]
 
[http://pan.baidu.com/s/1qW9LqH2 download]
 
===Micro-FST Composition Tool===
 
 
This package contains my python and shell script for joining the micro-FST into the special position of another FST. In addition, this toolkit has solved the determinization problem.
 
 
[http://cslt.riit.tsinghua.edu.cn:8081/pubtools/fstemb/tag-based_lm.tar.gz download]
 
 
===vMF-SNE===
 
 
vMF-SNE is a tool for spherical data embedding and visualization. The tool is based on t-SNE  (http://lvdmaaten.github.io/tsne/) and adapted to deal with spherical data (l-2 normalized).
 
 
[http://cslt.riit.tsinghua.edu.cn:8081/pubtools/vmfsne/zip/vmfsne.tgz download]
 
 
Refer to the following TRP for more details: [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/0/0b/Cslt-trp-template-vmfsne.pdf VMF-SNE: Embedding for Spherical Data. Mian Wang, Dong Wang]
 

2020年5月3日 (日) 08:28的最后版本

GFCC

Time domain gamma tone cepstral coefficient (GFCC) provided by Jun Qi, EE dep. Tsinghua Univ., China. ref to:"Auditory feature based on Gammatone filters for robust speech recognition", ISCAS 2013.

download

Online CNSC

This package contains my matlab code for online learning approach for convolutive non-negative sparse coding (OLCNSC). Refer to the following paper on interspeech 2011: Dong Wang, Nicholas Evans, "Online Pattern Learning for Convolutive Non-negative Sparse Coding"

download

Micro-FST Composition Tool

Usually we need embed a small FST to a large FST, to implement a nested LM. This has been used in the pair-word approach for unfrequent word enhancement. Check the paper of interspeech 2015 here. This package contains a tool for the FST embedding. download

vMF-SNE

vMF-SNE is a tool for spherical data embedding and visualization. The tool is based on t-SNE (http://lvdmaaten.github.io/tsne/) and adapted to deal with spherical data (l-2 normalized).

download

Refer to the following TRP for more details: VMF-SNE: Embedding for Spherical Data. Mian Wang, Dong Wang


JCLTS (deprecated)

JCLTS is a toolkit which uses joint-multigram model plus CRF to generate predictions of a stream given another stream. The JMM is responsible for a simple model to obtain raw alignment of the two streams, and the CRF is used to train a powerful model based on the alignment. ref to:Dong Wang, Simon King, "Letter-to-Sound Pronunciation Prediction using Conditional Random Field", IEEE Signal Processing Letters, vol 18, no.2, February 2011 , pp 122-125.

download

FSTi(deprecated)

This FSTi tookit contains a set of indexing tools developed in CSLT, Tsinghua University. The main purpose of FSTi is to provide a quick and easy way to construct an entire STD system when combining with some standard tools, including: HTK from Cambridge: http://htk.eng.cam.ac.uk/ lattice-tool from SRI: http://www-speech.sri.com/projects/srilm/manpages/lattice-tool.1.html OpenFST: http://www.openfst.org/twiki/bin/view/FST/WebHome

FSTi provides three integration approaches, any can be used to construct a full practical STD system:

1. HTK + lat2fst: standard FST-based indexing[2,3] (liblse from BUT required).

2. HTK + lattice-tool + ridx: standard ngram indexing[4].

3. HTK + lattice-tool + ngram2fst: ngram-based FST indexing[1].

For more details, please refer to the following paper on Interspeech[1]. [1] Chao Liu, Dong Wang, "N-gram FST indexing for spoken term detection", Interspeech 2012.

download

HCNSC(deprecated)

We publish the heterogeneous CNSC code plus an example task on speech separation. For details please refer to Heterogeneous convolutive non-negative sparse coding. Dong Wang, Javier Tejedor, submiited to Interspeech. This fold contains the following directory: 1. olcnsc. The core of online convolutive non-negative sparse coding, extended with heterogeneous learning 2. utest. An example task for speech separation. This includes a basic invokation example and two scripts that demonstrate how to search for optimal base distributions. 3. util. Some util scripts that assit utest. You can use and distribute this code freely for research purpose. The authors do not take any responsibility for any damage caused by running the code. Any comments, questions, bugs.. are particularly welcome.

download


Crawler based on keywords (deprecated)

This is a crawler which can get the Top-K relevant webpages' contents returned by Search Engine e.g. Bing with keywords e.g. yahoo, 百度(Chinese writing of Baidu). This code is implemented by Java. The tool supports Chinese keywords and English keywords. I share the source code in the following website. Please download the jar files, readme document and source code files.

download

Sina weibo crawler(deprecated)

This is a crawler to get sina weibo's information including body, comments, number of forwards and etc. This code is implemented by Python. This crawler will get weibo 340M/day if you conduct uninterrupted running for 24 hours. I share the source code in the following website. Please download the readme document and source code files.

download