Lecture 8 - Multimodal Learning
Teacher: Leopold Maytie (ANITI)
Contents
- Introduction
- Recall
- MLP
- CNN
- RNN
- Transformers
- How to train a model
- How to learn from multimodality ?
- Datasets
- Methods of fusion, coordination
- Foundation Models
- Multimodal Tasks
- Image Captioning
- Visual Question Answering
- Multimodal conversational AI system
- Vision-and-Language Navigation
- Examples of Models
- CoDi
- ImageBind
- BLIP-2
- CoCa
- Inner Monologue
- Palm-E
- Conclusion
Notes
Download the slides here