Lecture 8 - Multimodal Learning | Intro2AI-class

Lecture 8 - Multimodal Learning

Teacher: Leopold Maytie (ANITI)

Contents

Introduction
Recall
- MLP
- CNN
- RNN
- Transformers
- How to train a model
How to learn from multimodality ?
- Datasets
- Methods of fusion, coordination
- Foundation Models
Multimodal Tasks
- Image Captioning
- Visual Question Answering
- Multimodal conversational AI system
- Vision-and-Language Navigation
Examples of Models
- CoDi
- ImageBind
- BLIP-2
- CoCa
- Inner Monologue
- Palm-E
Conclusion

Notes

Download the slides here

Tutorial Notebook

Link to Colab Notebook

(Back to Main Page)