This series consists of 10 sessions, each lasting 2 hours, focused on the mathematics of machine learning. It outlines the primary concepts without delving into the intricacies of the proofs. Clicking on the title of each session provides access to the transcript. Additionally, there are basic notes available to guide you through the structure and progression of the content.
Content:
- Introduction and motivation
- Gradients, Jacobians, Hessians
- Gradient descent and acceleration
- Stochastic Gradient Descent (SGD)
Materials:
Bibliography:
Content:
- Proofs of gradient descent and acceleration
- Linear models and regularization
- Ridge versus Lasso
- ISTA Algorithm
Materials:
Bibliography:
Content:
- Examples of non-smooth functionals (Lasso, TV regularization, constraints)
- Subgradient and proximal operators
- Forward-backward splitting, connection with FISTA
- ADMM, Douglas-Rachford (DR), Primal-Dual
- Compressive sensing theory
Materials:
Bibliography:
- A Mathematical Introduction to Compressive Sensing by Foucart, Simon and Rauhut, Holger (advanced)
- Convex Optimization, by Boyd and Vandenberghe
- Proximal Algorithms, by N. Parikh and S. Boyd
Content:
- Transition from ridge regression to kernels
- Multilayer Perceptron (MLP)
- Convolutional Neural Networks (CNN)
- ResNet architecture
- Transformer models
Materials:
Bibliography:
Content:
- Review of MLP and its variants (CNN, ResNet)
- Theoretical framework of two-layer MLPs
- Gradient and Jacobians in neural networks
- Introduction to backpropagation
Materials:
Bibliography:
Content:
- Recap on Gradient and Jacobian
- Forward and reverse mode automatic differentiation
- Introduction to PyTorch
- The adjoint method in computational mathematics
Materials:
Bibliography:
Content:
- Refresher on Stochastic Gradient Descent (SGD)
- Introduction to Langevin dynamics
- Overview of diffusion models
Materials:
Content:
- Overview of different generative model concepts
- Introduction to generative models (VAE, GANs, U-Net, diffusion)
- Semi-supervised learning and next token prediction
- Tokenizers
- Transformer architectures, Flash attention
- State space models
Materials:
Bibliography:
Content:
- Understanding generative models as density fitting techniques.
- Basics of Maximum Likelihood Estimation and f-divergences.
- Gaussian mixtures and the Expectation-Maximization algorithm.
- Variational Autoencoders (VAE).
- Introduction to Normalizing Flows.
- Generative Adversarial Networks (GANs), Wasserstein GANs (WGANs).
- Diffusion Models.
Materials:
Bibliography:
Course #10 - Optimal Transport
Content:
- Introduction to Monge and Kantorovich formulations.
- The Sinkhorn algorithm.
- Training of generative models.
- Duality and Wasserstein GANs.
Materials:
Bibliography: