Material Detail

Cocktail Party Problem as Binary Classification

Cocktail Party Problem as Binary Classification

This video was recorded at Machine Learning Summer School (MLSS), Chicago 2009. Speech segregation, or the cocktail party problem, has proven to be extremely challenging. Part of the challenge stems from the lack of a carefully analyzed computational goal. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal time-frequency (T-F) binary mask as a main goal for computational auditory scene analysis. Ideal binary masking retains the mixture energy in T-F units where the local signal-to-noise ratio exceeds a certain threshold, and rejects the mixture energy in other T-F units. Recent psychophysical evidence shows that ideal binary masking leads to large speech intelligibility improvements in noisy environments for both normal-hearing and hearing-impaired listeners. The effectiveness of the ideal binary mask implies that sound separation may be formulated as a case of binary classification, which opens the cocktail party problem to a variety of pattern classification and clustering methods. As an example, I discuss a recent system that segregates unvoiced speech by supervised classification of acoustic-phonetic features.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collection (1) Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.