Lecture 10 - Sound processing & speech recognition
Teacher: Ismail Khalfaoui Hassani (ANITI)
Very important: due to a strong response in the community, we are splitting the class in 2 sessions. This page is for the advanced session, appropriate for students with prior training in Maths, Engineering and/or Computer Science. If you are here by mistake, go to [the main/basic session page].
Lecture video
View the recorded lecture here (this will only be available for approximately 6 weeks after the course)
Lecture 10 - Sound processing & speech recognition
Teacher: Ismail Khalfaoui Hassani (ANITI)
Contents
- How to encode audio ?
- Raw signal / sampling / DFT /iDFT.
- Mel Frequency Cepstral Coefficients (MFCC).
- Automatic speech recognition (ASR) from 1970 to 2010.
- Isolated word recognition.
- Noisy channel models.
- Standard ASR systems.
- Automatic speech recognition from 2010 to nowadays.
- End-to-end ASR systems.
- Convolutional neural networks for speech processing.
- Recurrent neural networks.
- Connectionist temporal classification (CTC).
- Hybrid Transformers with CTC.