The Mathematics of IA

10 lectures

This series consists of 10 sessions, each lasting 2 hours, focused on the mathematics of machine learning. It outlines the primary concepts without delving into the intricacies of the proofs. Clicking on the title of each session provides access to the transcript. Additionally, there are basic notes available to guide you through the structure and progression of the content.

Course #1 - Smooth Optimization

Content:

Introduction and motivation
Gradients, Jacobians, Hessians
Gradient descent and acceleration
Stochastic Gradient Descent (SGD)

Materials:

Notebook on Regression
Notebook on Classification
My course notes: Optimization for Machine Learning
Exercises Sheet

Bibliography:

Convex Optimization, by Boyd and Vandenberghe
Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications, by Amir Beck

Course #2 - From Smooth to Non-Smooth Optimization

Content:

Proofs of gradient descent and acceleration
Linear models and regularization
Ridge versus Lasso
ISTA Algorithm

Materials:

Notebook on Linear Regression (specifically the Lasso part)
Notebook on Interior Point Methods
My course notes: The Mathematical Tours of Signal Processing

Bibliography:

Course Notes on Convexity by Vincent Duval
Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications, by Amir Beck

Course #3 - Lasso and Compressed Sensing

Content:

Examples of non-smooth functionals (Lasso, TV regularization, constraints)
Subgradient and proximal operators
Forward-backward splitting, connection with FISTA
ADMM, Douglas-Rachford (DR), Primal-Dual
Compressive sensing theory

Materials:

My course notes: The Mathematical Tours of Signal Processing,
Notebook on Douglas-Rachford Proximal Method
Proximal Operators Repository (including Python code)
Non-Smooth Optimization Slides
Compressed Sensing Slides

Bibliography:

A Mathematical Introduction to Compressive Sensing by Foucart, Simon and Rauhut, Holger (advanced)
Convex Optimization, by Boyd and Vandenberghe
Proximal Algorithms, by N. Parikh and S. Boyd

Course #4 - Kernel, Perceptron, CNN, and Transformers

Content:

Transition from ridge regression to kernels
Multilayer Perceptron (MLP)
Convolutional Neural Networks (CNN)
ResNet architecture
Transformer models

Materials:

Slides on deep learning
My course notes on Optimization for Machine Learning
Notebook on Multilayer Perceptron and Autograd

Bibliography:

The Elements of Statistical Learning, by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
Machine Learning: A Probabilistic Perspective, by Kevin Patrick Murphy (covers the theory of ML)

Course #5 - Deep Learning: Theory and Numerics

Content:

Review of MLP and its variants (CNN, ResNet)
Theoretical framework of two-layer MLPs
Gradient and Jacobians in neural networks
Introduction to backpropagation

Materials:

Bibliography:

Christopher Olah’s Blog

Course #6 - Differential Programming

Content:

Recap on Gradient and Jacobian
Forward and reverse mode automatic differentiation
Introduction to PyTorch
The adjoint method in computational mathematics

Materials:

Slides on automatic differentiation
My course notes on Optimization for Machine Learning
Code example: Multilayer perceptron and autograd

Bibliography:

Reverse-mode Automatic Differentiation: A Tutorial

Course #7 - Sampling and Diffusion Models

Content:

Refresher on Stochastic Gradient Descent (SGD)
Introduction to Langevin dynamics
Overview of diffusion models

Materials:

Numerical tour on diffusion models
Course notes on Diffusion Models

Course #8 - LLM and Generative AI

Content:

Overview of different generative model concepts
Introduction to generative models (VAE, GANs, U-Net, diffusion)
Semi-supervised learning and next token prediction
Tokenizers
Transformer architectures, Flash attention
State space models

Materials:

Introductory slides on Generative AI

Bibliography:

Andrej Karpathy’s video on tokenization
Byte Pair Encoding (Wikipedia)
Online tokenizer demo by OpenAI
Rotary Position Embedding paper
Codes: Flash attention, xFormers, Triton
Theory paper on Flash Attention
Mamba paper, Blog on Mamba (SSM), Parallel Prefix Sum algorithm

Course #9 - Generative Models

Content:

Understanding generative models as density fitting techniques.
Basics of Maximum Likelihood Estimation and f-divergences.
Gaussian mixtures and the Expectation-Maximization algorithm.
Variational Autoencoders (VAE).
Introduction to Normalizing Flows.
Generative Adversarial Networks (GANs), Wasserstein GANs (WGANs).
Diffusion Models.

Materials:

Bibliography:

Course #10 - Optimal Transport

Content:

Introduction to Monge and Kantorovich formulations.
The Sinkhorn algorithm.
Training of generative models.
Duality and Wasserstein GANs.

Materials:

Bibliography:

Computational Optimal Transport, by Gabriel Peyré and Marco Cuturi
Optimal Transport for Applied Mathematicians, by Filippo Santambrogio (advanced)
Python POT (Python Optimal Transport) toolbox