“ASR Status Report 2017-12-25”版本间的差异

来自cslt Wiki
跳转至: 导航搜索
 
(3位用户的7个中间修订版本未显示)
第4行: 第4行:
 
|-
 
|-
 
| rowspan="8"|2017.12.25
 
| rowspan="8"|2017.12.25
 +
 +
 +
 +
|-
 +
|Miao Zhang
 +
||
 +
*
 +
||
 +
* Read the 16k model script
 +
* The cough recognition codes left by Xiaofei
 +
||
 +
* check the trivial database, make it more reasonable
 +
* test the 16k model on the database
 +
|-
 +
  
  
第10行: 第25行:
 
||  
 
||  
 
* some function for voice-printer
 
* some function for voice-printer
**[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/63/SpkerVector2.png here]
+
** speaker vector per utterance  [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/63/SpkerVector2.png here]
**[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/6b/Spkear_vector.png here]
+
** speaker vector minus base speaker vector [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/6/6b/Spkear_vector.png here]
**[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f3/Speaker_factor_gray.png here]
+
* CTC for Haibo Wang (Token accuracy on train set 92.80%, on cv set 89.74%) haven't test on test set
**[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/9/97/1514176866%281%29.png here]
+
**[http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/4e/SpeakerQrCode2.png here]
+
* CTC for Haibo Wang
+
 
* QRcode
 
* QRcode
 +
** speaker vector merge phone grayscale [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/f/f3/Speaker_factor_gray.png here]
 +
** speaker vector merge phone black-and-white map  [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/9/97/1514176866%281%29.png here]
 +
** speaker vector merge phone black-and-white map minus base vector  [http://cslt.riit.tsinghua.edu.cn/mediawiki/images/4/4e/SpeakerQrCode2.png here]
 
* ivector baseline for kazak-uyghur LRE performance is 81.85% (Utt level)
 
* ivector baseline for kazak-uyghur LRE performance is 81.85% (Utt level)
 
||  
 
||  

2017年12月25日 (一) 06:53的最后版本

Date People Last Week This Week Task Tracking
2017.12.25


Miao Zhang
  • Read the 16k model script
  • The cough recognition codes left by Xiaofei
  • check the trivial database, make it more reasonable
  • test the 16k model on the database
Ying Shi
  • some function for voice-printer
    • speaker vector per utterance here
    • speaker vector minus base speaker vector here
  • CTC for Haibo Wang (Token accuracy on train set 92.80%, on cv set 89.74%) haven't test on test set
  • QRcode
    • speaker vector merge phone grayscale here
    • speaker vector merge phone black-and-white map here
    • speaker vector merge phone black-and-white map minus base vector here
  • ivector baseline for kazak-uyghur LRE performance is 81.85% (Utt level)
  • Finish voice-checker copyright and submit the copyright in this Wednesday
Lantian Li
  • Complete the recipe for `VV_FACTOR`.
  • 16K and 8K deep speaker model comparison.[1]
  • Patent for `VV_QuickMark`.
  • Complete the demo for `VV_FACTOR`.[Assign to Shouyi Dai]
  • Phonetic speaker embedding.
  • Overlap training for speaker features.
Zhiyuan Tang
  • word level pronunciation accuracy based on likelihood (tell which word is well pronounced as '0' or badly pronounced '1')
  • model adaptation
  • if possible, an alpha version Parrot for test inside lab to collect some data for better configurature




Date People Last Week This Week Task Tracking
2017.12.18


Ying Shi
  • Finish the Voice-printer program
  • Apply the software copyright of Voice-printer
  • APSIPA 2017
  • Finish the software copyright of Voice-checker
  • Baseline of similar language recongnition system(i-vector, DNN, PTN)
  • focus on function other than UI
  • i-vector LID first
Lantian Li
  • Optimize the demo of `VV_Seg` and `VV_QuickMark`.
  • Phone-aware scorning on deep speaker feature. [2]
  • Phone-aware scorning.
  • Overlap training for speaker features.
  • test on trivial dataset
Zhiyuan Tang
  • easy-to-read interfaces for Parrot
  • phone-level likelihood for detail diagnosis and an alpha version Parrot for test inside lab