Lecture 9 - Sound processing & speech recognition
Teacher: Julie Mauclair (IRIT)
Contents
-
How to encode audio ?
- Raw signal / sampling / DFT /iDFT.
- Mel Frequency Cepstral Coefficients (MFCC).
-
Automatic speech recognition (ASR) from 1970 to 2010.
- Isolated word recognition.
- Noisy channel models.
- Standard ASR systems.
-
Automatic speech recognition from 2010 to nowadays.
- End-to-end ASR systems.
- Convolutional neural networks for speech processing.
- Recurrent neural networks.
- Connectionist temporal classification (CTC).
- Hybrid Transformers with CTC.