Lecture 10 - Sound processing & speech recognition

Teacher: Ismail Khalfaoui Hassani (ANITI)

Very important: due to a strong response in the community, we are splitting the class in 2 sessions. This page is for the advanced session, appropriate for students with prior training in Maths, Engineering and/or Computer Science. If you are here by mistake, go to [the main/basic session page].

Lecture video

View the recorded lecture here (this will only be available for approximately 6 weeks after the course)

Lecture 10 - Sound processing & speech recognition

Teacher: Ismail Khalfaoui Hassani (ANITI)

How to encode audio ?
- Raw signal / sampling / DFT /iDFT.
- Mel Frequency Cepstral Coefficients (MFCC).
Automatic speech recognition (ASR) from 1970 to 2010.
- Isolated word recognition.
- Noisy channel models.
- Standard ASR systems.
Automatic speech recognition from 2010 to nowadays.
- End-to-end ASR systems.
- Convolutional neural networks for speech processing.
- Recurrent neural networks.
- Connectionist temporal classification (CTC).
- Hybrid Transformers with CTC.

Intro2AI-advanced-class

Lecture 10 - Sound processing & speech recognition

Lecture 10 - Sound processing & speech recognition

Lecture video

Lecture 10 - Sound processing & speech recognition

Contents

Slides

Practical work

(Back to Main Page)