Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods

Joseph Keshet (Editor), Samy Bengio (Co-Editor)

ISBN: 978-0-470-69683-5

Hardcover

268 pages

February 2009

List Price:	US $156.00
Government Price:	US $89.56
Enter Quantity: Buy

Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods (0470696834) cover image

< >

Description
Table of Contents
Author Information

List of Contributors.

Preface.

I Foundations.

1 Introduction (Samy Bengio and Joseph Keshet).

1.1 The Traditional Approach to Speech Processing.

1.2 Potential Problems of the Probabilistic Approach.

1.3 Support Vector Machines for Binary Classification.

1.4 Outline.

References.

2 Theory and Practice of Support Vector Machines Optimization (Shai Shalev-Shwartz and Nathan Srebo).

2.1 Introduction.

2.2 SVM and L₂-regularized Linear Prediction.

2.3 Optimization Accuracy From a Machine Learning Perspective.

2.4 Stochastic Gradient Descent.

2.5 Dual Decomposition Methods.

2.6 Summary.

References.

3 From Binary Classification to Categorial Prediction (Koby Crammer).

3.1 Multi-category Problems.

3.2 Hypothesis Class.

3.3 Loss Functions.

3.4 Hinge Loss Functions.

3.5 A Generalized Perceptron Algorithm.

3.6 A Generalized Passive–Aggressive Algorithm.

3.7 A Batch Formulation.

3.8 Concluding Remarks.

3.9 Appendix. Derivations of the Duals of the Passive–Aggressive Algorithm and the Batch Formulation.

References.

II Acoustic Modeling.

4 A Large Margin Algorithm for Forced Alignment (Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer and Dan Chazan).

4.1 Introduction.

4.2 Problem Setting.

4.3 Cost and Risk.

4.4 A Large Margin Approach for Forced Alignment.

4.5 An Iterative Algorithm.

4.6 Efficient Evaluation of the Alignment Function.

4.7 Base Alignment Functions.

4.8 Experimental Results.

4.9 Discussion.

References.

5 A Kernel Wrapper for Phoneme Sequence Recognition (Joseph Keshet and Dan Chazan).

5.1 Introduction.

5.2 Problem Setting.

5.3 Frame-based Phoneme Classifier.

5.4 Kernel-based Iterative Algorithm for Phoneme Recognition.

5.5 Nonlinear Feature Functions.

5.6 Preliminary Experimental Results.

5.7 Discussion: Canwe Hope for Better Results?

References.

6 Augmented Statistical Models: Using Dynamic Kernels for Acoustic Models (Mark J. F. Gales).

6.1 Introduction.

6.2 Temporal Correlation Modeling.

6.3 Dynamic Kernels.

6.4 Augmented Statistical Models.

6.5 Experimental Results.

6.6 Conclusions.

Acknowledgements.

References.

7 Large Margin Training of Continuous Density Hidden Markov Models (Fei Sha and Lawrence K. Saul).

7.1 Introduction.

7.2 Background.

7.3 Large Margin Training.

7.4 Experimental Results.

7.5 Conclusion.

References.

III Language Modeling.

8 A Survey of Discriminative Language Modeling Approaches for Large Vocabulary Continuous Speech Recognition (Brian Roark).

8.1 Introduction.

8.2 General Framework.

8.3 Further Developments.

8.4 Summary and Discussion.

References.

9 Large Margin Methods for Part-of-Speech Tagging (Yasemin Altun).

9.1 Introduction.

9.2 Modeling Sequence Labeling.

9.3 Sequence Boosting.

9.4 Hidden Markov Support Vector Machines.

9.5 Experiments.

9.6 Discussion.

References.

10 A Proposal for a Kernel Based Algorithm for Large Vocabulary Continuous Speech Recognition (Joseph Keshet).

10.1 Introduction.

10.2 Segment Models and Hidden Markov Models.

10.3 Kernel Based Model.

10.4 Large Margin Training.

10.5 Implementation Details.

10.6 Discussion.

Acknowledgements.

References.

IV Applications.

11 Discriminative Keyword Spotting (David Grangier, Joseph Keshet and Samy Bengio).

11.1 Introduction.

11.2 Previous Work.

11.3 Discriminative Keyword Spotting.

11.4 Experiments and Results.

11.5 Conclusions.

Acknowledgements.

References.

12 Kernel-based Text-independent Speaker Verification (Johnny Mariéthoz, Samy Bengio and Yves Grandvalet).

12.1 Introduction.

12.2 Generative Approaches.

12.3 Discriminative Approaches.

12.4 Benchmarking Methodology.

12.5 Kernels for Speaker Verification.

12.6 Parameter Sharing.

12.7 Is the Margin Useful for This Problem?

12.8 Comparing all Methods.

12.9 Conclusion.

References.

13 Spectral Clustering for Speech Separation (Francis R. Bach and Michael I. Jordan).

13.1 Introduction.

13.2 Spectral Clustering and Normalized Cuts.

13.3 Cost Functions for Learning the Similarity Matrix.

13.4 Algorithms for Learning the Similarity Matrix.

13.5 Speech Separation as Spectrogram Segmentation.

13.6 Spectral Clustering for Speech Separation.

13.7 Conclusions.

References .

Index.

Related Titles

Audio & Speech Processing and Broadcasting

Short-Range Wireless Communications: Emerging Technologies and Applications

by Rolf Kraemer (Editor), Marcos Katz (Editor)

Wideband Beamforming: Concepts and Techniques

by Wei Liu, Stephan Weiss

Visual Media Coding and Transmission

by Ahmet Kondoz (Editor)

Digital Video Distribution in Broadband, Television, Mobile and Converged Networks: Trends, Challenges and Solutions

by Sanjoy Paul

The DVB-H Handbook: The Functioning and Planning of Mobile TV

by Jyrki T. J. Penttinen, Petri Jolma, Erkki Aaltonen, Jani Väre

The Handbook of MPEG Applications: Standards in Practice

by Marios C. Angelides (Editor), Harry Agius (Editor)

4G Wireless Video Communications

by Haohong Wang, Lisimachos Kondi, Ajay Luthra, Song Ci

Read Online Now at Wiley Online Library

An online version of this product is available through our subscription-based content service.
Read Online

Read an Excerpt

Permissions

To reuse content from this title

Request permission

Join An E-mail List

Learn about the latest products, events, offers and content.

Our Solutions, Your Way

Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods

Related Titles

Audio & Speech Processing and Broadcasting

Read Online Now at Wiley Online Library

Read an Excerpt

Permissions

Join An E-mail List

About Wiley

Resources

Customer Support