TIMIT acoustic-phonetic continous speech corpus CD-ROM. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: DARPA arXiv preprint arXiv:1702.07825 (2017)īahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly Li, X., Miller, J., Ng, A., Raiman, J., et al.: Deep voice: Real-time neural ReferencesĪrik, S.O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Our model achieves 4.05 for intelligibility and 3.78 for naturalness on a 5-point MOS scale. We evaluate the sufficiency and the completeness of our corpus by training an end-to-end text-to-speech neural network on RUSLAN. It consists of 22200 text-audio pairs with the total audio duration being 31 hours 32 minutes and exceeds the second largest Russian corpus for a single speaker by 50%. We present RUSLAN spoken language corpus – the largest Russian open speech corpus for a single speaker for the text-to-speech task.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |