Areas of Training
-
Big Data & HPC
-
AI
-
Data Science
-
Quantum Computing
-
IoT & Edge Computing
Skills blocks1:Basics for Data Science& AI
Module 1: Python For Data Science
Objective:
In this courses, we will learn how to use Pandas DataFrames, Numpy multi-dimensional arrays, and SciPy libraries to work with various datasets. We will introduce you to pandas library used to load, manipulate, analyze, and visualize datasets.
Prerequisite:
Programming skills and Linux.
Topics:
Install IDE and Jupyter Notebook for python3
Objects, Variables and data types
Control flow and loops
Data Formatting and Data Normalization
Data Aggregation and Grouping
Data Cleaning and Handle Missing Values
Describe Statistics
Reporting and Data visualisation
Transforming a Jupyter notebook into a standalone, interactive web application accessible via Voila and Binder.
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
1 final exam
1 Project use case : insurance, banking, health, climate, etc...
References :
https://www.python.org
Module 2: Numerical Computing For Data Science
Objective:
This course in linear algebra and matrix calculus is essential, it gives you the basics to take up studies in data science & artificial intelligence. It will allow you to do exploratory data analysis. .
Prerequisite:
L2 or L3 level in linear algebra, functionel analysis and python or R programming.
Topics:
Scalars, Vectors, Matrices and Tensors
Multiplying Matrices and Vectors
Identity and Inverse Matrices
Eigen decomposition
The Moore-Penrose Pseudo inverse
The Trace Operator
Singular Value Décomposition
Principal Components Analysis(PCA)
Linear Discriminant Analysis (LDA)
Matrix Methods in Signal Processing
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours )or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
1 final exam
1 Project use case : insurance, banking, health, climate, etc...
References :
An Introduction to Statistical Learning, with applications in R Published August 1, 2021. Available in eprint from Springer. Orders can be placed for hardcover, available August 30, 2021. https://www.python.org
Module 3: Probability, Statistics and Modelling
Objective:
This course in probability and statistical modelling is essential to give you the foundation to undertake studies in data science & intelligence. In the world of deep and machine learning, we manipulate inferential statistical models in the form of large vectors.
Prerequisite:
L2 or L3 level in linear algebra, probability, statistics and Python or R programming.
Topics:
Random Variables
Probability Distributions Marginal Probability Conditional Probability Expectation, Variance and Covariance Estimating the Correlation Linear and Logistic Regression Least Squares and Maximum Likelihood Multiple Regression Model Selection Multivariate statisticsOrganization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
1 final exam
References :
An Introduction to Statistical Learning, with applications in R Published August 1, 2021. Available in eprint from Springer. Orders can be placed for hardcover, available August 30, 2021.
Module 4: Optimization For Data Science
Objective:
This course describes the mathematical tools needed to optimize statistical learning models. It will give the mathematical foundations of convex optimization, and describe the different approaches used for the construction of efficient convex optimization algorithms.
Prerequisite:
course code : NCD1.2and Fuctional analysis
Topics:
Convexity
Gradient Methods
Proximal algorithms
Coordinate Descent Methods
Subgradient Methods
Primal-Dual context and certificates
Lagrange and Fenchel Duality
Second-Order Methods
Quasi-Newton Methods
Gradient-Free and Zero-Order Optimization.
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
1 final exam
References :
S. Boyd and L. Vandenberghe. Convex Optimization. CUP.
Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.
Skills blocks 2: APPLIED MACHINE AND DEEP LEARNING
Module 5: Computational Optimisation for Data Science
Objective:
This course will help you understand and implement convex optimization algorithms that are very useful in industry. Convex optimisation is highly essential in machine and deep learning.
Prerequisite:
Convex optimisation or course code: ODS1.4 .
Topics:
Conjugate gradient for linear systems
Conjugate gradient for general functions
Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm
Davidon-Fletcher-Powell algorithm
Batch Gradient Descent
Stochastic Gradient Descent
Mini-batch Gradient Descent
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
1 final exam
1 Project use case : insurance, banking, health, climate, etc...
References :
S. Boyd and L. Vandenberghe. Convex Optimization. CUP.
Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.
Module 6: Machine Learning I: Supervised Machine Learning
Objective:
This course is very operational and will allow you to process text, image, sound and time series data (discrete, continuous, qualitative and quantitative variables) in different use cases.
Prerequisite:
Probability and statistics, linear algebra Optimisation, Basic Programming Skills Python or course code : NCD1.2/ PSM1.3 /CODS2.1.
Topics:
Features extraction and Labelization
Split data for training, testing and validation
Model setting and training / Model evaluation
Overfitting /Underfitting/Training Data/SteppingBack
Hyperparameter Tuning and Model Selection
Linear, Polynomial, Ridge, Lasso, Logistic, Softmax Regression
Regularized Linear and Elastic Net
Training and Cost Function
Decision Boundaries
Training a Binary Classifier
Measuring Accuracy Using Cross-Validation/ Confusion Matrix
Precision/Recall /Tradeoff / ROC Curve
Multiclass Classification / SVM Classification
Adding Similarity Features
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours )or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
1 final exam
1 Project use case : insurance, banking, health, climate, etc...
References :
An Introduction to Statistical Learning, with applications in R Published August 1, 2021.
Bishop, Christopher M. Pattern Recognition and Machine Learning. Vol. 1. New York : Springer, 2006.
Module 7: Deep Learning
Objective:
This course is very operational and will allow you to process text, image, sound and time series data (discrete, continuous, qualitative and quantitative variables) in various use cases such as vision and NLP. This course uses TensorFlow as the main programming tool.
Prerequisite:
Machine Learning, Optimisation and programming via Python
Topics:
Deep Learning concept
Neural Network (NN)
Convolutional neural network (CNN)
Time Series Forecasting
Recurrent Neural Network (RNN)
Long Short-Term Memory (LSTM)
Deep Scattering Transform Network (DSN )
Hybrid Recurrent Scattering Neural Network
DeepDream and style transfer
Word embedding, Machine Translation and Seq2Seq (NLP)
kNN, SVM, SoftMax, two-layer network
PyTorch / Tensorflow
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
Project use case : insurance, banking, health, climate, Robotics...
1 final exam
References :
An Introduction to Statistical Learning, with applications in R Published August 1, 2021. Available in eprint from Springer. Orders can be placed for hardcover, available August 30, 2021.
Module 8: Deep & Reinforcement Learning
Objective:
This course tackles the problems of learning and decision making under uncertainty and focuses on reinforcement learning and the multi-armed bandit.
Prerequisite:
Probability, stocastic calculus,Deep Learning and Python programming
Topics:
Markov decision processes and dynamic programming .
Stochastic and adversarial multi-arm bandit .
Tabular Reinforcement learning.
Deep learning for reinforcement .
Deep Q-Network (DQN) .
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
Project use case : insurance, banking, health, climate, Robotics...
1 final exam
References :
Sutton, R. et Barto, A. Reinforcement Learning: An Introduction. Processus decisionnels de Markov et Intelligence Artificielle, 2008. Editeurs O. Sigaud et O. Buffet. Algorithms for Reinforcement Learning. Cs. Szepesvari, 2009
Skills blocks 3: Advanced Statistical Learning
Module 9: High Dimension Statistics
Objective:
The theory of high-dimensional statistics will help you better understand machine and deep learning. In this course we will focus on non-asymptotic statistical problems where the number of variables can be greater than the sample size (P>>n). This phenomenon is called "high-dimensional curse" because contrary to asymtotic statistics, it leads to problems of numerical bias, inference or estimator.
Prerequisite:
Knowledge of linear algebra, matrix calculus, graph theory, optimization, probability and statistics. R or Python programming language. Course code : NCFD1.2/PSM1.3/OAMDLL2.1
Topics:
Sub-Gaussian Random Variables .
Linear Regression Model .
Misspecified Linear Models .
Minimax Lower Bounds .
Matrix estimation .
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
Project use case : insurance, banking, health, climate, Robotics...
1 final exam
References :
P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9, Orders can be placed for hardcover, available August 30, 2021.
S. Boyd and L. Vandenberghe. Convex Optimization. CUP.
Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.
Module 10: Probabilistic Graphical Models
Objective:
The objective of this course is to familiarize you with the statistical modeling of complex multivariate data via probabilistic graphical models or Bayesian networks. Applications in signal processing, computer vision and AI demonstrate this.
Prerequisite:
Knowledge of linear algebra, matrix calculus, graph theory, optimization, probability and statistics. R or Python programming language. Course code : HDS3.1 or NCFD1.2/PSM1.3/OAMDLL2.1.
Topics:
Directed and undirected graphical models
Maximum likelihood
Linear regression
Logistic regression
Gaussian Mixture Models and clustering
Exponential family distributions
Sum-product algorithm and exact inference
Hidden Markov models
Approximate inference
Bayesian methods .
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
Project use case : insurance, banking, health, climate, Robotics...
1 final exam
References :
P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9, Orders can be placed for hardcover, available August 30, 2021.
S. Boyd and L. Vandenberghe. Convex Optimization. CUP.
Y. Nesterov. Introductory Lectures on Convex Optimization. Springer.
Module 11: Distributed High dimension statistics
Objective:
In the context of our R&D and teaching activities, these workshops will keep us up to date. In this workshop, we review research papers focused on distributed deep learning to achieve efficiency and scalability of deep learning work on distributed and parallel systems.
Prerequisite:
Course code : HDS3.1/ PGM3.2/ DL2.4
Topics:
High dimension statistics and complexity
Federated Learning
Deep Learning on HPC systems
Deep Learning on heterogeneous infrastructure (GPU, CPU)
VQuantum Computing for Deep & Machine learning .Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
3 homeworks with implementations of algorithms with Python
Project use case : insurance, banking, health, climate, Robotics...
1 final exam
References :
P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9, Orders can be placed for hardcover, available August 30, 2021.
S. Boyd and L. Vandenberghe. Convex Optimization. CUP.
Y. Nesterov. Introductory Lectures on Convex Optimization. Springer
P. Bühlmann and S. van de Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Series in Statistics, DOI 10.1007/978-3-642-20192-9 .
Skills blocks 4: Artificiel Intelligence Industrialization
Module 12: Hybrid Cloud and Edge Computing
Objective:
This course will enable you to set up a big data architecture, implement and put your data projects into production.
Prerequisite:
knowledge of the Linux operating system (shell), Python programming.
Topics:
Overview of Cloud Technologies
IaaS Programming Interfaces
The REST and Python programming interfaces of AWS
OpenStack and CloudStack
Containerized Kubernetes
Hybrid Cloud Infrastructures
Edge Computing an IoT
Organization:
8 Lecture and Tutoriel of 3 hours (Total: 24 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
2 homeworks with implementations of algorithms with Python
1 project
References :
https://kubecampus.io/kubernetes/courses/
https://www.balena.io/etcher
https://www.ubuntu-fr.org
Module 13: Machine Learning Engineering For Production (MLOps
Objective:
This course will enable you to better manage the life cycle of Deep Learning models (versioning and storage of data and the model, history, traceability, monitoring, etc.). To scale up trained models, deploy web applications available to clients and to orchestrate the various data processing operations linked to the models.
Prerequisite:
knowledge of the Linux operating system (shell), Python programming. And Machine Learning ,Deep Learning, Python,Linux and Kubernetes
Topics:
Issues of a MLOps
MLflow, model storage with S3
Creating an API with TensorFlow serving, Flask, Celery, FastAPI and Redis
Use of widgets to add interactivity to web applications (Streamlit, Dash and Panel)
Task orchestration with Airflow
Introduction to the features and architecture of Apache Airflow
Navigating the user interface and using the CLI
Organization:
10 Lecture and Tutoriel of 3 hours (Total: 30 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
1 project
References :
https://mlflow.org
https://airflow.apache.org
https://fastapi.tiangolo.com
Module 14: Data Storage and Parallel Computing
Objective:
In the context of our R&D and teaching activities, these workshops will keep us up to date. In this workshop, we review research papers focused on distributed deep learning to achieve efficiency and scalability of deep learning work on distributed and parallel systems.
Prerequisite:
Course code : This course will enable you to set up a big data architecture, implement and put into production your AI algorithms in distributed mode.
Topics:
Distributed Big data architecture and data lake via Hadoop
NoSQL distributed data storage
Parallel machine learning via PySPARK Mlib
Data migration and partitioning via Dataiku
Dynamic Dashboard via Azure ML Service, Power bi, NLP, ML
Organization:
12 Lecture and Tutoriel of 3 hours (Total: 36 hours ) or 1 week full immersion. All classes and materials will be in English or French. All interactions, homeworks, exams can be done in French or English.
Validation :
1 project
References :
https://spark.apache.org
https://www.elastic.co/fr/what-is/elk-stack
https://www.tensorflow.org/guide/distributed_training?hl=fr