Material Detail

Modelling Intra-Speaker Variability for Improved Speaker Recognition

Modelling Intra-Speaker Variability for Improved Speaker Recognition

This video was recorded at Workshop on Subspace, Latent Structure and Feature Selection Techniques: Statistical and Optimisation Perspectives, Bohinj 2005. In this paper we present a speaker recognition algorithm that models explicitly intra-speaker inter-session variability. Such variability, which is caused by channel, noise and temporary speaker characteristics (mood, fatigue, etc.), is not modeled explicitly by the state-of-the-art speaker recognition algorithms. We define a session-space in which each session (either train or test spoken utterance) is a vector. We then calculate a rotation of the session-space for which the estimated intra-speaker subspace is trivially isolated and can be modeled explicitly. Due to the high dimensionality of the session-space, it is impossible to use standard orthogonalization methods. We therefore used QR factorization based on Givens rotations to calculate the projection. On the NIST-2004 evaluation corpus, recognition error rate was reduced by 23% compared to the classic GMM state-of-the-art algorithm.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.