Material Detail

Finding Acoustic Regularities in Speech: From Words to Segments

Finding Acoustic Regularities in Speech: From Words to Segments

This video was recorded at Center for Language and Speech Processing (CLSP) Seminar Series. The development of an automatic speech recognizer is typically a highly supervised process involving the specification of phonetic inventories, lexicons, acoustic and language models, along with annotated training corpora. Although some model parameters may be modified via adaptation, the overall structure of the speech recognizer remains relatively static thereafter. While this approach has been effective for problems when there is adequate human expertise and labeled corpora, it is challenged by less-supervised or unsupervised scenarios. It also stands in stark contrast to human processing of speech and language where learning is an intrinsic capability. From a machine learning perspective, a complementary alternative is to discover unit inventories in an unsupervised manner by exploiting the structure of repeating acoustic patterns within the speech signal. In this work we use pattern discovery methods to automatically acquire lexical entities, as well as speaker and topic segmentations directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multi-word phrases. On a corpus of lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream. We have applied the acoustic pattern matching and clustering methods to several important problems in speech and language processing. In addition to showing how this methodology applies across different languages, we demonstrate two methods to automatically determine the identify of speech clusters. Finally, we also show how it can be used to provide an unsupervised segmentation of speakers and topics. Joint work with Alex Park, Igor Malioutov, and Regina Barzilay.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.