Artículo

Ferrer, L.; Nandwana, M.K.; McLaren, M.; Castan, D.; Lawson, A. "Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option" (2019) IEEE/ACM Transactions on Audio Speech and Language Processing. 27(1):140-153
Estamos trabajando para incorporar este artículo al repositorio
Consulte el artículo en la página del editor
Consulte la política de Acceso Abierto del editor

Abstract:

The output scores of most of the speaker recognition systems are not directly interpretable as stand-alone values. For this reason, a calibration step is usually performed on the scores to convert them into proper likelihood ratios, which have a clear probabilistic interpretation. The standard calibration approach transforms the system scores using a linear function trained using data selected to closely match the evaluation conditions. This selection, though, is not feasible when the evaluation conditions are unknown. In previous work, we proposed a calibration approach for this scenario called trial-based calibration (TBC). TBC trains a separate calibration model for each test trial using data that is dynamically selected from a candidate training set to match the conditions of the trial. In this work, we extend the TBC method, proposing: 1) a new similarity metric for selecting training data that result in significant gains over the one proposed in the original work; 2) a new option that enables the system to reject a trial when not enough matched data are available for training the calibration model; and 3) the use of regularization to improve the robustness of the calibration models trained for each trial. We test the proposed algorithms on a development set composed of several conditions and on the Federal Bureau of Investigation multi-condition speaker recognition dataset, and we demonstrate that the proposed approach reduces calibration loss to values close to 0 for most of the conditions when matched calibration data are available for selection, and that it can reject most of the trials for which relevant calibration data are unavailable. © 2014 IEEE.

Registro:

Documento: Artículo
Título:Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option
Autor:Ferrer, L.; Nandwana, M.K.; McLaren, M.; Castan, D.; Lawson, A.
Filiación:Instituto de Investigación en Ciencias de la Computación, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad de Buenos Aires, Buenos Aires, B105, Argentina
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA 94025, United States
Palabras clave:forensic voice comparison; Speaker recognition; trial-based calibration; Calibration; Data structures; Logistics; Mathematical transformations; Personnel training; Statistical tests; Computational model; Forensic voice comparisons; Forensics; Probabilistic interpretation; Similarity metrics; Speaker recognition; Speaker recognition system; Standard calibration; Speech recognition
Año:2019
Volumen:27
Número:1
Página de inicio:140
Página de fin:153
DOI: http://dx.doi.org/10.1109/TASLP.2018.2875794
Título revista:IEEE/ACM Transactions on Audio Speech and Language Processing
Título revista abreviado:IEEE ACM Trans. Audio Speech Lang. Process.
ISSN:23299290
Registro:http://digital.bl.fcen.uba.ar/collection/paper/document/paper_23299290_v27_n1_p140_Ferrer

Referencias:

  • Ferrer, L., Sönmez, K., Kajarekar, S., Class-dependent score combination for speaker recognition (2005) Proc. Interspeech, , Lisbon, Portugal, Sep
  • Solewicz, Y., Koppel, M., Considering speech quality in speaker verification fusion (2005) Proc. Interspeech, , Lisbon, Portugal, Sep
  • Solewicz, Y., Koppel, M., Using post-classifiers to enhance fusion of low-and high-level speaker recognition (2007) IEEE Trans. Audio, Speech, Lang. Process., 15 (7), pp. 2063-2071. , Sep
  • Ferrer, L., Graciarena, M., Zymnis, A., Shriberg, E., System combination using auxiliary information for speaker verification (2008) Proc. IEEE Int. Conf. Acoust., pp. 4853-4856. , Speech, Signal Process., Las Vegas, NV, USA, Apr
  • Brümmer, N., (2008) Focal Bilinear Toolkit, , http://niko.brummer.googlepages.com/focalbilinear
  • McLaren, M., Lawson, A., Ferrer, L., Scheffer, N., Lei, Y., Trialbased calibration for speaker recognition in unseen conditions (2014) Proc. Odyssey, , Joensuu, Finland, Jun
  • Morrison, G., Enzinger, E., Forensic speech science-Review: 2010-2013 (2013) Proc. 17th Int. Forensic Sci. Managers' Symp., pp. 616-623. , Lyon, France
  • Schwartz, R., When to punt on speaker comparison? (2011) J. Acoust. Soc. Amer., 130. , Oct
  • Brümmer, N., Swart, A., Van Leeuwen, D., A comparison of linear and non-linear calibrations for speaker recognition (2014) Proc. Odyssey, , Joensuu, Finland, Jun
  • Hautamäki, V., Lee, K.A., Kinnunen, T., Ma, B., Li, H., Regularized logistic regression fusion for speaker verification (2011) Proc. Interspeech, , Florence, Italy, Aug
  • Sturim, D., Reynolds, D., Speaker adaptive cohort selection for T-norm in text-independent speaker verification (2005) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. I/741-I/744. , Philadelphia, PA, USA, Mar
  • Shum, S., Dehak, N., Dehak, R., Glass, J., Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification (2010) Proc. Odyssey, , Brno, Czech Republic, Jun
  • Castan, D., McLaren, M., Ferrer, L., Lawson, A., Lozano-Diez, A., Improving robustness of speaker recognition to new conditions using unlabeled data (2017) Proc. Interspeech, , Stockholm, Sweden, Aug
  • Ferrer, L., Burget, L., Plchot, O., Scheffer, N., A unified approach for audio characterization and its application to speaker recognition (2012) Proc. Odyssey, , Singapore, Jun
  • Mandasari, M.I., Saeidi, R., McLaren, M., Van Leeuwen, D.A., Quality measure functions for calibration of speaker recognition systems in various duration conditions (2013) IEEE Trans. Audio, Speech, Lang. Process., 21 (11), pp. 2425-2438. , Nov
  • Graff, D., Walker, K., Miller, D., (2001) Switchboard Cellular Part 1 Audio LDC2001S13, , https://catalog.ldc.upenn.edu/LDC2001S13
  • Graff, D., Walker, K., Canavan, A., (1999) Switchboard-2 Phase II LDC99S79, , https://catalog.ldc.upenn.edu/LDC99S79
  • Morrison, G., (2015) Forensic Database of Voice Recordings of 500+ Australian English Speakers, , http://databases.forensic-voice-comparison.net
  • McGovern, S.G., (2004) A Model for Room Acoustics, , https://www.mathworks.com/matlabcentral/fileexchange/5110-fastconvolution
  • Hirsch, G., (2005) Fant, , http://dnt.kr.hs-niederrhein.de/download.html
  • Walker, K., Strassel, S., The RATS radio traffic collection system (2012) Proc. Odyssey, , Singapore, Jun
  • Beck, S.D., Schwartz, R., Nakasone, H., A bilingual multi-modal voice corpus for language and speaker recognition (LASR) services (2004) Proc. Odyssey, , Toledo, Spain, May
  • Lei, Y., Hansen, J., Dialect classification via text-independent training and testing for Arabic, Spanish and Chinese (2011) IEEE Trans. Audio, Speech, Lang. Process., 19 (1), pp. 85-96. , Jan
  • Przybocki, M., Martin, A., The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking (1999) Proc. Eurospeech, , Budapest, Hungary, Sep
  • Godin, K.W., Sadjadi, S.O., Hansen, J.H., Impact of noise reduction and spectrum estimation on noise robust speaker identification (2013) Proc. Interspeech, , Lyon, France, Aug
  • Brümmer, N., Du Preez, J., Application independent evaluation of speaker detection (2006) Comput. Speech Lang., 20, pp. 230-275. , Apr.-Jul
  • Brümmer, N., Du Preez, J., The PAV Algorithm Optimizes Binary Proper Scoring Rules, , https://sites.google.com/site/nikobrummer/pav-optimizes-rbpsr.pdf
  • Cieri, C., Corson, L., Graff, D., Walker, K., Resources for new research directions in speaker recognition: The Mixer 3, 4 and 5 corpora (2007) Proc. Interspeech, , Antwerp, Belgium, Aug
  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P., Front-end factor analysis for speaker verification (2011) IEEE Trans. Audio, Speech, Lang. Process., 19 (4), pp. 788-798. , May
  • McLaren, M., Castan, D., Ferrer, L., Lawson, A., On the issue of calibration in DNN-based speaker recognition systems (2016) Proc. Interspeech, , San Francisco, CA, USA, Sep
  • Lei, Y., Scheffer, N., Ferrer, L., McLaren, M., A novel scheme for speaker recognition using a phonetically-aware deep neural network (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 1695-1699. , Florence, Italy, May
  • Ferrer, L., Promoting robustness for speaker modeling in the community: The PRISM evaluation set (2011) Proc. NIST Speaker Recognit. Anal. Workshop, pp. 1-7. , Atlanta, GA, USA, Dec
  • McLaren, M., Abrash, V., Graciarena, M., Lei, Y., Pesán, J., Improving robustness to compressed speech in speaker recognition (2013) Proc. Interspeech, pp. 3698-3702. , Lyon, France, Aug

Citas:

---------- APA ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D. & Lawson, A. (2019) . Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option. IEEE/ACM Transactions on Audio Speech and Language Processing, 27(1), 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794
---------- CHICAGO ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. "Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option" . IEEE/ACM Transactions on Audio Speech and Language Processing 27, no. 1 (2019) : 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794
---------- MLA ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. "Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option" . IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 27, no. 1, 2019, pp. 140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794
---------- VANCOUVER ----------
Ferrer, L., Nandwana, M.K., McLaren, M., Castan, D., Lawson, A. Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option. IEEE ACM Trans. Audio Speech Lang. Process. 2019;27(1):140-153.
http://dx.doi.org/10.1109/TASLP.2018.2875794