Emilia: a speech corpus for Argentine Spanish text to speech synthesis

Torres, H.M.; Gurlekian, J.A.; Evin, D.A.; Cossio Mercado, C.G.

doi:10.1007/s10579-019-09447-7

Navegar

Documento Últimos Documentos Autor FCEN - Año Autor FCEN - Revista Año - Revista Revista - Año SubjectPcEn Colores Type

Colección

Artículo

Torres, H.M.; Gurlekian, J.A.; Evin, D.A.; Cossio Mercado, C.G. "Emilia: a speech corpus for Argentine Spanish text to speech synthesis" (2019) Language Resources and Evaluation

https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_1574020X_v_n_p_Torres

Estamos trabajando para incorporar este artículo al repositorio

Consulte el artículo en la página del editor

Consulte la política de Acceso Abierto del editor

Abstract:

This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-quality speech with a minimum corpus size. The text corpus was designed to guarantee the phonetic and prosodic coverage. A three-stage strategy was used: in the first stage, 741 sentences were designed with all of the syllables of Spanish spoken in Argentina, with and without stress, and in all positions within the word; in the second stage, 852 sentences were added to balance out the distribution of the diphones; and after a perceptual evaluation of the quality of synthesized speech, in the third and final stage, 625 sentences were added to achieve the specified unit coverage, and to introduce sentences with more complex syntactic and prosodic structures. Issues from all three corpus building stages are reported. The paper also presents the results from the quality perceptual evaluations of the synthesized voice. Emilia has a duration of three hours and 15 minutes; its speech quality synthesized with Aromo system is similar to the level obtained with commercial systems, with a real-time ratio less than one. © 2019, Springer Nature B.V.

Registro:

Documento:	Artículo
Título:	Emilia: a speech corpus for Argentine Spanish text to speech synthesis
Autor:	Torres, H.M.; Gurlekian, J.A.; Evin, D.A.; Cossio Mercado, C.G.
Filiación:	Laboratorio de Investigaciones Sensoriales, INIGEM, CONICET-UBA, Av. Córdoba 2351, 9 Piso Sala 2. C.A.B.A. (1120), Buenos Aires, Argentina Center for Research and Transfer in Acoustics (CINTRA), UTN-FRC UA CONICET, Master M. López esq. Argentine Red Cross, University City, Córdoba Capital, X5016ZAA, Argentina Departamento de Computación, FCEN, UBA, University City, Buenos Aires, C1428EGA, Argentina
Palabras clave:	Argentine Spanish; Phonetic corpus; Phonetic transcription; Speech corpus design; Text-to-speech
Año:	2019
DOI:	http://dx.doi.org/10.1007/s10579-019-09447-7
Título revista:	Language Resources and Evaluation
Título revista abreviado:	Lang. Resour. Eval.
ISSN:	1574020X
Registro:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_1574020X_v_n_p_Torres

Referencias:

Adell, J., Bonafonte, A., Gomez, J., Castro, M., Comparative study of automatic phone segmentation methods for TTS (2005) Proceedings of the ICASSP’05, pp. 309-312. , https://doi.org/10.1109/ICASSP.2005.1415112
Aguilar, L., Fernzández, J., Garrido, J., Llisterri, J., Monzón, A.M.L., Crespo, M.R., Evaluation of a Spanish text-to-speech system (1994) Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, pp. 207-210. , https://www.isca-speech.org/archive_open/archive_papers/ssw2/ssw2_207.pdf
Alıas, F., Iriondo, I., Barnola, P., Multi-domain text classification for unit selection text-to-speech synthesis (2003) In Procedings of the 15Th International Congress of Phonetic Sciences, pp. 2341-2344. , https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_2341.pdf
Alvarez, Y.V., Huckvale, M., The reliability of the ITU-T P.85 standard for the evaluation of text-to-speech systems (2002) In Proceedings of the 7Th International Conference on Speech & Language Processing, pp. 329-332. , https://www.isca-speech.org/archive/archive_papers/icslp_2002/i02_0329.pdf
Andersen, O., Hoequist, C., Keeping rare events rare (2003) Proceedings of the Eighth European Conference on Speech Communication & Technology, pp. 2-1337. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2003/e03_1337.pdf
Badino, L., Barolo, C., Quazza, S., Language independent phoneme mapping for foreign TTS (2004) Proceedings of the Fifth ISCA Workshop on Speech Synthesis, pp. 127-137. , https://www.isca-speech.org/archive_open/archive_papers/ssw5/ssw5_217.pdf, Pittsburgh, PA, USA
Bayerl, P.S., Paul, K.I., What determines inter-coder agreement in manual annotations? A meta-analytic investigation (2011) Computational Linguistics, 37 (4), pp. 699-725
Bellegarda, J.R., Unit-centric feature mapping for inventory pruning in unit selection text-to-speech synthesis (2008) IEEE Transactions on Audio, Speech, and Language Processing, 16 (1), pp. 74-82
Benoît, C., Grice, M., Hazan, V., The SUS test: A method for the assessment of TTS synthesis intelligibility (1966) Speech Communication, 18 (4), pp. 381-392
Betz, S., Carlmeyer, B., Wagner, P., Wrede, B., Interactive hesitation synthesis: Modelling and evaluation (2018) Multimodal Technologies and Interaction, 2 (1), p. 9
Beutnagel, M., Conkie, A., Interaction of units in a unit selection database (1999) In Proceedings of the Sixth European Conference on Speech Communication and Technology, 3, pp. 1063-1066. , https://www.isca-speech.org/archive/archive_papers/eurospeech_1999/e99_1063.pdf
Black, A.W., Lenzo, K.A., Limited domain synthesis (2000) Proceedings of the 6Th International Conference on Spoken Language Processing, 2, pp. 411-414. , https://www.isca-speech.org/archive/archive_papers/icslp_2000/i00_2411.pdf
Black, A.W., Lenzo, K.A., (2003) Building synthetic voices, , http://festvox.org/bsv/bsv.pdf, Language Technologies Institute, Carnegie Mellon University and Cepstral LLC 4:2
Boëffard, O., Variable-length acoustic units inference for text-to-speech synthesis (2001) Proceedings of the 7Th European Conference on Speech Communication and Technology, pp. 983-986. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2001/e01_0983.pdf
Bonafonte, A., Höge, H., Kiss, I., Moreno, A., Ziegenhain, U., Heuvel, H., Hain, H., Garcia, M., TC-STAR: Specifications of language resources and evaluation for speech (2006) Proceedings of the 5Th Interantional Conference on Language Resources and Evaluation, pp. 311-314. , http://nlp.lsi.upc.edu/publications/papers/tc_star_spec.pdf
Bonafonte, A., Höge, H., Tropf, H.S., Moreno, A., Heuvel, H., Sündermann, D., Ziegenhain, U., Jokisch, O., TTS baselines and specifications (2005) In Deliverable D8 of the EU Project TC-STAR Technology and Corpora for Speech to Speech Translation (FP6-506738), , http://nlp.lsi.upc.edu/publications/papers/tc_star_spec.pdf
Bozkurt, B., Ozturk, O., Dutoit, T., Text design for TTS speech corpus building using a modified greedy selection (2003) Proceedings of the Eighth European Conference on Speech Communication and Technology, pp. 277-280. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2003/e03_0277.pdf
Breen, A.P., Jackson, P., Non-uniform unit selection and the similarity metric within BT’s laureate TTS system (1998) Proceedings of the Third ESCA Workshop on Speech Synthesis, pp. 373-376. , https://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_201.pdf
Campbell, N., Chatr: A high-definition speech re-sequencing system (1996) Proceedings of the 3Rd ASA/ASJ Joint Meeting, pp. 1223-1228. , http://www.speech-data.jp/nick/feast/proceeding/asa-asj%201996_12.pdf
Campbell, N., Developments in corpus-based speech synthesis: Approaching natural conversational speech (2005) IEICE Transactions on Information and Systems, 88 (3), pp. 376-383
Chalamandaris, A., Tsiakoulis, P., Raptis, S., Karabetsos, S., Corpus design for a unit selection TTS system with application to Bulgarian (2011) Human Language Technology Challenges for Computer Science and Linguistics, 6562, pp. 35-46
Chevelu, J., Barbot, N., Boeffard, O., Delhay, A., Comparing set-covering strategies for optimal corpus design (2008) Proceedings of the 23Rd European Signal Processing Conference, pp. 2951-2956. , http://lrec-conf.org/proceedings/lrec2008/pdf/750_paper.pdf
Chevelu, J., Lolive, D., Do not build your TTS training corpus randomly (2015) Proceedings of the Signal Processing Conference, pp. 350-354. , https://doi.org/10.1109/EUSIPCO.2015.7362403, IEEE
Chu, M., Chen, Y., Zhao, Y., Li, Y., Soong, F., A study on how human annotations benefit the TTS voice (2006) In Proceedings of the Blizzard Challenge Workshop 2006., , http://www.festvox.org/blizzard/bc2006/msra_blizzard2006.pdf
Chu, M., Peng, H., An objective measure for estimating MOS of synthesized speech (2001) Proceedings of the Eventh European Conference on Speech Communication and Technology, 3, pp. 2087-2090. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2001/e01_2087.pdf
Coelho, L., Hain, H.U., Jokisch, O., Braga, D., Towards an objective voice preference definition for the portuguese language (2009) In Proceedings of the Joint Sig-Il/Microsoft Workshop on Speech and Language Technologies for Iberian Languages, pp. 67-70. , http://www.isca-speech.org/archive_open/sltech_2009/papers/isl9_067.pdf
Colantoni, L., Gurlekian, J., Convergence and intonation: Historical evidence from Buenos Aires Spanish (2004) Bilingualism: Language and Cognition, 7 (2), pp. 107-119
Coloma, G., Illustrations of the IPA: Argentine Spanish (2018) Journal of the International Phonetic Association, 48, pp. 243-250
Cryer, H., Home, S., (2010) Review of methods for evaluating synthetic speech, , https://www.rnib.org.uk/sites/default/files/2010_02_Evaluating_synthetic_speech_review.doc, RNIB Centre for Accessible Information, Birmingham: Technical report #8
Dutoit, T., (1997) An introduction to text-to-speech synthesis. Text, speech and language technology, , Kluwer Academic, Dordrecht
Dybkjær, L., Hemsen, H., (2007) Evaluation of text and speech systems, , Springer, Berlin
Eisen, B., Reliability of speech segmentation and labelling at different levels of transcription (1993) Proccedings of 3Rd European Conference on Speech Communication and Technology, 1, pp. 673-676. , https://www.isca-speech.org/archive/archive_papers/eurospeech_1993/e93_0673.pdf
(1993) ESPS version 5.0 programs manual, , Entropic Research Laboratory, Washington, D.C
Falk, T.H., Moller, S., Towards signal-based instrumental quality diagnosis for text-to-speech systems (2008) IEEE Signal Processing Letters, 15, pp. 781-784
Febrer, A., Padrell, J., Bonafonte, A., Generation of unit databases for the UPC text-to-speech system (1998) Proceedings of the International Workshop on Speech and Computer, pp. 26-29. , http://www.lsi.upc.edu/~nlp/papers/febrer98b.pdf
Fernández-Torné, A., Matamala, A., Text-to-speech vs. Human voiced audio descriptions: A reception study in films dubbed into catalan (2015) The Journal of Specialised Translation, 24, pp. 61-88. , http://www.jostrans.org/issue24/art_fernandez.php
François, H., Boëffard, O., Design of an optimal continuous speech database for text-to-speech synthesis considered as a set covering problem (2001) In Proceedings of the Seventh European Conference on Speech Communication and Technology, pp. 829-832. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2001/e01_0829.pdf
François, H., Boëffard, O., The greedy algorithm and its application to the construction of a continuous speech database (2002) Procedings of the Third International Conference on Language Resources and Evaluation, pp. 1420-1426. , http://lrec.elra.info/proceedings/lrec2002/pdf/265.pdf
Fujisaki, H., Hirose, K., Analysis of voice fundamental frequency contours for declarative sentences of japanese (1984) Journal of Acoustic Society of Japan, 5 (4), pp. 233-242. , https://www.jstage.jst.go.jp/article/ast1980/5/4/5_4_233/_pdf
Grůber, M., Matoušek, J., Tihelka, D., Hanzlicek, Z., Reducing footprint of unit selection TTS system by removing linguistic segments with rarely selected units (2014) Proceedings of the 12Th International Conference on Signal Processing, pp. 494-499. , https://doi.org/10.1109/ICOSP.2014.7015054
Grůber, M., Tihelka, D., Matoušek, J., Evaluation of various unit types in the unit selection approach for the czech language using the festival system (2007) In Proceedings of the 6Th ISCA Workshop on Speech Synthesis, pp. 276-281. , http://www.isca-speech.org/archive_open/archive_papers/ssw6/ssw6_276.pdf
Guirao, M., Jurado, M.G., (1993) Estudio estadístico del español, , CONICET, Bue Aires
Gurlekian, J.A., Colantoni, L., Torres, H.M., El alfabeto fonético SAMPA y el diseño de córpora fonéticamente balanceados (2001) Fonoaudiológica, 47 (3), pp. 58-70
Gurlekian, J.A., Cossio-Mercado, C., Torres, H.M., Vaccari, M.E., Subjective evaluation of a high quality text-to-speech system for argentine spanish (2012) Proceedings of VII Jornadas En Tecnologí Del Habla and III Iberian Sltech Workshop, pp. 241-250. , https://www.researchgate.net/profile/Christian_Cossio-Mercado/publication/265955190_Subjective_Evaluation_of_a_High_Quality_Text-to-Speech_System_for_Argentine_Spanish/links/552ef53d0cf2acd38cbbdad4.pdf, IberSPEECH 2012
Gurlekian, J.A., Rodríguez, H., Colantoni, L., Torres, H.M., Development of a prosodic database for an argentine spanish text to speech system (2001) Proceedings of the IRCS Workshop on Linguistic Databases, SIAM, pp. 99-104. , http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.5050&rep=rep1&type=pdf, (b), B. Bird, M. Liberman
Gurlekian, J.A., Torres, H.M., Evin, D., Guía para la segmentación y transcripción fonética para las tecnologías del habla (2014) Fonoaudiológica, 61 (2), pp. 24-27
Hall, J.L., Application of multidimensional scaling to subjective evaluation of coded speech (2001) The Journal of the Acoustical Society of America, 110 (4), pp. 2167-2182
Hansakunbuntheung, C., Rugchatjaroen, A., Wutiwiwatchai, C., Space reduction of speech corpus based on quality perception for unit selection speech synthesis (2005) Proceedings of the 6Th International Symposium on Natural Language Processing, pp. 127-132. , https://www.researchgate.net/profile/Chatchawarn_Hansakunbuntheung/publication/228957899_Space:reduction_of_speech_corpus_based_on_quality_perception_for_unit_selection_speech_synthesis/links/0912f510bb45091b12000000.pdf
Harris, J., (1983) Syllable structure and Stress in Spanish, , The MIT Press, Cambridge
Hinterleitner, F., Norrenbrock, C., Möller, S., Is intelligibility still the main problem? A review of perceptual quality dimensions of synthetic speech (2013) In Proceedings of the Eighth ISCA Workshop on Speech Synthesis, pp. 147-151. , http://ssw8.talp.cat/papers/ssw8_PS2-1_Hinterleitner.pdf
Hinterleitner, F., Norrenbrock, C., Möller, S., Heute, U., (2014) Text-to-speech synthesis. Quality of experience, pp. 179-193. , Springer, Berlin
Hinterleitner, F., Zabel, S., Möller, S., Leutelt, L., Norrenbrock, C., Predicting the quality of synthesized speech using reference-based prediction measures (2011) Proceedings of the 22Th Konferenz Elektronische Sprachsignalverarbeitung, pp. 99-106. , http://www.qu.tu-berlin.de/fileadmin/fg41/publications/hinterleitner_2011_predicting-the-quality-of-synthesized-speech-using-reference.-.based-prediction-measures.pdf
Hirst, D., Rilliard, A., Aubergé, V., Comparison of subjective evaluation and an objective evaluation metric for prosody in text-to-speech synthesis (1998) Proceedings of the Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis, pp. 293-306. , https://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_001.pdf
Hoeckel, C., The reliability of manual labelling of continuous speech (1989) Proceedings of the ESCA Workshop on Speech Input/Output Assessment an Speech Databases, 2, pp. 2179-2182. , http://www.isca-speech.org/archive_open/archive_papers/sioa_89/sia_2179.pdf
Hon, H., Acero, A., Huang, X., Liu, J., Plumpe, M., Automatic generation of synthesis units for trainable text to speech systems (1998) In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), 1, pp. 293-306. , https://doi.org/10.1109/ICASSP.1998.674425
Karabetsos, S., Tsiakoulis, P., Chalamandaris, A., Raptis, S., Embedded unit selection text-to-speech synthesis for mobile devices (2009) IEEE Transactions on Consumer Electronics, 55 (2), pp. 613-621
Kawai, H., Toda, T., An evaluation of automatic phone segmentation for concatenative speech synthesis (2004) Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 1, pp. 1-677. , https://doi.org/10.1109/ICASSP.2004.1326076
Kawai, H., Tsuzaki, M., Study on time-dependent voice quality variation in a large-scale single speaker speech corpus used for speech synthesis (2002) In Proceedings of the IEEE Workshop on Speech Synthesis, pp. 15-18. , https://doi.org/10.1109/WSS.2002.1224362
Kelly, A.C., Berthelsen, H., Campbell, N., Chasaide, A.N., Gobl, C., Corpus design techniques for irish speech synthesis (2009) Proceedings of the China Ireland ICT Conference, pp. 264-265. , http://www.eeng.dcu.ie/ciict/2009/proceedings.pdf
King, S., Measuring a decade of progress in text-to-speech (2014) Loquens, 1 (1)
Kishore, S., Black, A., Unit size in unit selection speech synthesis (2003) Proceedings of the Eurospeech 2003, pp. 1317-1320. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2003/e03_1317.pdf
Krul, A., Damnati, G., Yvon, F., Boidin, C., Moudenc, T., Approaches for adaptive database reduction for text-to-speech synthesis (2007) Proceedings of the Eighth Annual Conference of the International Speech Communication Association, 3, pp. 2881-2884. , https://www.isca-speech.org/archive/archive_papers/interspeech_2007/i07_2881.pdf
Kurtic, E., (2004) Polyglot Voice Design for Unit Selection Speech Synthesis, , https://www.era.lib.ed.ac.uk/bitstream/handle/1842/2070/Emina%20Kurtic.pdf?sequence=1&isAllowed=y, Master’s thesis, School of Philosophy, Psychology and Language Sciences, University of Edinburgh
Lambert, T., Braunschweiler, N., Buchholz, S., How (Not) to select your voice corpus: Random selection vs. phonologically balanced (2007) In Proceedings of the 6Th ISCA Workshop on Speech Synthesis, pp. 22-24. , https://isca-speech.org/archive_open/archive_papers/ssw6/ssw6_264.pdf
Lewis, E., Tatham, M., Word and syllable concatenation in text-to-speech synthesis (1999) Proceedings of the Sixth European Conference on Speech Communications and Technology, 2, pp. 615-618. , https://www.isca-speech.org/archive/archive_papers/eurospeech_1999/e99_0615.pdf
Llisterri, J., (1999), http://liceu.uab.es/~joaquim/publicacions/RESLA_99.pdf, ). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada, Monográfico: Panorama de la Investigación en Lingüística Informática, (pp, 53–82); Lu, H., Zhang, W., Shao, X., Lei, Q.Z.W., Zhou, H., Breen, A., Pruning redundant synthesis units based on static and delta unit appearance frequency (2015) Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, pp. 269-273. , https://www.isca-speech.org/archive/interspeech_2015/papers/i15_0269.pdf
Marino, J.B., Nogueiras, A., Pachès-Leal, P., Bonafonte, A., The demiphone: An efficient contextual subword unit for continuous speech recognition (2000) Speech Communication, 32 (3), pp. 187-197
Matoušek, J., Psutka, J., Design of speech corpus for text-to-speech synthesis (2001) Proceedings of the 7Th Conference on Speech Communication and Technology, pp. 2047-2050. , https://www.isca-speech.org/archive/archive_papers/eurospeech_2001/e01_2047.pdf
Matoušek, J., Tihelka, D., Romportl, J., Building of a speech corpus optimised for unit selection TTS synthesis (2008) In Proceedings of 6Th International Conference on Language Resources and Evaluation, pp. 1296-1299. , http://www.lrec-conf.org/proceedings/lrec2008/pdf/329_paper.pdf
Mayo, C., Clark, R.A., King, S., Multidimensional scaling of listener responses to synthetic speech (2005) Proceedings of the 9Th European Conference on Speech Communication and Technology, pp. 1725-1728. , https://www.isca-speech.org/archive/archive_papers/interspeech_2005/i05_1725.pdf
McPherson, I., (1975) Spanish phonology: Descriptive and historical, , Manchester Univiversity Press, Manchester
Mendelson, J., Aylett, M., Beyond the listening test: An interactive approach to TTS evaluation (2017) In Proceedings of the 18Th Annual Conference of the International Speech Communication Association, pp. 20-24. , https://doi.org/10.21437/Interspeech.2017-1438
Möbius, B., Corpus-based speech synthesis: Methods and challenges (2000) AIMS, Arbeitspapiere Des Instituts für Maschinelle Sprachverarbeitung, 6 (4), pp. 87-116. , http://www.ims.uni-stuttgart.de/~moebius/papers/unitsel.pdf
Möbius, B., Rare events and closed domains: Two delicate concepts in speech synthesis (2003) International Journal of Speech Technology, 6 (1), pp. 57-71
Möller, S., Hinterleitner, F., Falk, T.H., Polzehl, T., Comparison of approaches for instrumentally predicting the quality of text-to-speech systems (2010) In Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, pp. 1325-1328. , https://www.isca-speech.org/archive/archive_papers/interspeech_2010/i10_1325.pdf
Ni, J., Hirai, T., Kawai, H., Toda, T., Tokuda, K., Tsuzaki, M., Sakai, S., Nakamura, S., ATRECSS: ATR english speech corpus for speech synthesis (2007) In Proceedings of the 6Th ISCA Workshop on Speech Synthesis, Paper 002, , https://www.isca-speech.org/archive_open/archive_papers/blizzard_2007/blz3_002.pdf
Niebuhr, O., Michaud, A., Speech data acquisition: The underestimated challenge (2015) In Kalipho-Kieler Arbeiten Zur Linguistik Und Phonetik, 3, pp. 1-42. , https://halshs.archives-ouvertes.fr/halshs-01026295v4/document
Norrenbrock, C.R., Hinterleitner, F., Heute, U., Möller, S., Quality prediction of synthesized speech based on perceptual quality dimensions (2015) Speech Communication, 66, pp. 17-35
Oliveira, L.C., Paulo, S., Figueira, L., Mendes, C., Nunes, A., Godinho, J., Methodologies for designing and recording speech databases for corpus based synthesis (2008) Proceedings of the 6Th International Conference on Language Resources and Evaluation, pp. 2921-2925. , http://www.lrec-conf.org/proceedings/lrec2008/pdf/741_paper.pdf
(1990) Studies Toward the Unification of Picture Assessment Methodology, , https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.1082-1-1990-PDF-E.pdf, Technical report, ITU
(1996) Methods for Subjective Determination of Transmission Quality. Technical Report, ITU, , https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.800-199608-I!!PDF-E&type=items
(1994) Method for Subjective Performance Assessment of the Quality of Speech Voice Output Devices, , https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.85-199406-I!!PDF-E&type=items, Technical report, ITU
Peterson, G.E., Wang, W.S.Y., Sivertsen, E., Segmentation techniques in speech synthesis (1958) The Journal of the Acoustical Society of America, 30 (8), pp. 739-742
Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W., The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability (2005) Speech Communication, 45 (1), pp. 89-95
Prudon, R., D’Alessandro, C., A selection/concatenation text to speech synthesis system: Databases development, system design, comparative evaluation (2001) Proceedings of the 4Th Speech Synthesis Workshop (SSW4-2001), , https://www.isca-speech.org/archive_open/archive_papers/ssw4/ssw4_138.pdf, paper 138
Rodríguez, H., (2000) Construcción de una base de datos para el desarrollo de sistemas de conversión de texto a habla, , University of La Plata, Buenos Aires, licenciature thesis
Rosenberg, A., Ramabhadran, B., Bias and statistical significance in evaluating speech synthesis with mean opinion scores (2017) Proceedings of the 18Th Annual Conference of the International Speech Communication Association, pp. 3976-3980. , https://doi.org/10.21437/Interspeech.2017-479
(1992) Dictionary of the Spanish language, , Espasa Calpe, Madrid
Rutten, P., Aylett, M.P., Fackrell, J., Taylor, P., A statistically motivated database pruning technique for unit selection synthesis (2002) Proceedings of the Seventh International Conference on Spoken Language Processing, pp. 125-128. , https://www.isca-speech.org/archive/archive_papers/icslp_2002/i02_0125.pdf
Sainz, I., Navas, E., Hernáez, I., Bonafonte, A., Campillo, F., TTS evaluation campaign with a common spanish database (2010) In Proceedings of the Seventh International Conference on Language Resources and Evaluation, pp. 2155-2160. , http://www.lrec-conf.org/proceedings/lrec2010/pdf/456_Paper.pdf
Schiel, F., Baumann, A., Draxler, C., Ellbogen, T., Hoole, P., Steffen, A., (2012) The Validation of Speech Corpora. Munchen: Bavarian Archive for Speech Signals, , https://epub.ub.uni-muenchen.de/13698/1/schiel_13698.pdf
Sityaev, D., Knill, K., Burrows, T., Comparison of the ITU-T P.85 standard to other methods for the evaluation of text-to-speech systems (2006) Proceedings of the Ninth International Conference on Spoken Language Processing, pp. 2743-2746. , https://www.isca-speech.org/archive/archive_papers/interspeech_2006/i06_1233.pdf
Streijl, R.C., Winkler, S., Hands, D.S., Mean opinion score (mos) revisited: Methods and applications, limitations and alternatives (2016) Multimedia Systems, 22 (2), pp. 213-227
Syrdal, A., Wightman, C., Conkie, A., Stylianou, Y., Beutnagel, M., Schroeter, J., Strom, V., Makashay, M., Corpus-based techniques in the AT&t nextgen synthesis system (2000) In Proceedings of the 6Th International Conference on Spoken Language Processing, 3, pp. 410-415. , https://www.isca-speech.org/archive/archive_papers/icslp_2000/i00_3410.pdf
Syrdal, A.K., Conkie, A., Stylianou, Y., Exploration of acoustic correlates in speaker selection for concatenative synthesis (1998) Proceedings of the International Conference on Spoken Language Processing, 6, pp. 2743-2746. , https://www.isca-speech.org/archive/archive_papers/icslp_1998/i98_0882.pdf
Taylor, P., (2009) Text-to-speech synthesis, , Cambridge University Press, Cambridge
Torres, H.M., (2012) Creación De Un Corpus De Texto Para La construcción De Un Sistema TTS, , http://www.lis.secyt.gov.ar/informes/2012.pdf, Informe técnico 0325-2043, Laboratorio de Investigaciones Sensoriales, UBA-CONICET, Buenos Aires, Argentina
Torres, H.M., (2013) Medición De La Velocidad De conversión Del Sistema TTS Aromo, , http://www.lis.secyt.gov.ar/informes/2013.pdf, Informe técnico 0325-2043, Laboratorio de Investigaciones Sensoriales, UBA-CONICET, Buenos Aires, Argentina
Torres, H.M., Gurlekian, J., Automatic determination of phrase breaks for argentine spanish (2004) In Proceedings of the Speech Prosody 2004, pp. 553-556. , http://www.isca-speech.org/archive_open/sp2004/sp04_553.pdf
Torres, H.M., Gurlekian, J.A., Acoustic speech unit segmentation for concatenative synthesis (2008) Computer Speech and Language, 22, pp. 196-206
Torres, H.M., Gurlekian, J.A., Parameter estimation and prediction from text for a superpositional intonation model (2009) Proceedings of the 20 Konferenz Elektronische Sprachsignalverarbeitung, pp. 238-247. , https://www.researchgate.net/publication/265963364_Parameter_estimation_and_prediction_from_text_for_a_superpositional_intonation_model
Torres, H.M., Gurlekian, J.A., Novel estimation method for the superpositional intonation model (2016) IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (1), pp. 151-160
Torres, H.M., Gurlekian, J.A., Mercado, C., Aromo: Argentine spanish TTS system (2012) In Proceedings of VII Jornadas En Tecnología Del Habla and III Iberian Sltech Workshop, pp. 416-421. , https://www.researchgate.net/profile/Christian_Cossio-Mercado/publication/265952108_Aromo_Argentine_Spanish_TTS_System/links/570c37ea08aee0660351b0b9.pdf
Umbert, M., Moreno, A., Agüero, P., Bonafonte, A., Spanish synthesis corpora (2006) Proceedings of the International Conference of Language Resources and Evaluation, pp. 2102-2105. , http://www.lrec-conf.org/proceedings/lrec2006/pdf/590_pdf.pdf
Vainio, M., Jarvikivi, J., Werner, S., Volk, N., Valikangas, J., Effect of prosodic naturalness on segmental acceptability in synthetic speech (2002) Proceedings of 2002 IEEE Workshop on Speech Synthesis, pp. 143-146. , https://doi.org/10.1109/WSS.2002.1224394
Valentini-Botinhao, C., Yamagishi, J., King, S., Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise (2011) In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5112-5115. , https://doi.org/10.1109/ICASSP.2011.5947507
van den Heuvel, H., Iskra, D., Sanders, E., de Vriend, F., Validation of spoken language resources: An overview of basic aspects (2008) Language Resources and Evaluation, 42 (1), pp. 41-73
van Santen, J.P.H., Prosodic modelling in text-to-speech synthesis (1997) Proceedings of the 5Th European Conference on Speech Communication and Technology, 5, pp. 2511-2514. , https://www.isca-speech.org/archive/archive_papers/eurospeech_1997/e97_KN19.pdf
Viswanathan, M., Viswanathan, M., Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (mos) scale (2005) Computer Speech & Language, 19 (1), pp. 55-83
Watson, A., Mullin, J., Smallwood, L., Wilson, G., (2001) New techniques for assessing audio and video quality in real-time interactive communication, , http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.494.6094&rep=rep1&type=pdf, In Tutorial at IHM-HCI, Lille, France
Zhang, W., Liu, Y., Deng, Y., Pang, M., Automatic construction for a TTS corpus with limited text (2010) Proccedings of the 2010 International Conference on Measuring Technology and Mechatronics Automation, 1, pp. 707-710. , https://doi.org/10.1109/ICMTMA.2010.796

Citas:

---------- APA ----------

Torres, H.M., Gurlekian, J.A., Evin, D.A. & Cossio Mercado, C.G. (2019) . Emilia: a speech corpus for Argentine Spanish text to speech synthesis. Language Resources and Evaluation.
http://dx.doi.org/10.1007/s10579-019-09447-7

---------- CHICAGO ----------

Torres, H.M., Gurlekian, J.A., Evin, D.A., Cossio Mercado, C.G. "Emilia: a speech corpus for Argentine Spanish text to speech synthesis" . Language Resources and Evaluation (2019).
http://dx.doi.org/10.1007/s10579-019-09447-7

---------- MLA ----------

Torres, H.M., Gurlekian, J.A., Evin, D.A., Cossio Mercado, C.G. "Emilia: a speech corpus for Argentine Spanish text to speech synthesis" . Language Resources and Evaluation, 2019.
http://dx.doi.org/10.1007/s10579-019-09447-7

---------- VANCOUVER ----------

Torres, H.M., Gurlekian, J.A., Evin, D.A., Cossio Mercado, C.G. Emilia: a speech corpus for Argentine Spanish text to speech synthesis. Lang. Resour. Eval. 2019.
http://dx.doi.org/10.1007/s10579-019-09447-7