Introduction to Machine Learning Syllabus

This course introduces fundamental machine learning techniques, combining theoretical understanding with practical applications. Each session consists of 1.5 hours of lecture and 1.5 hours of practical exercises, which will be graded individually. The final session will be dedicated to an exam.

Prerequisites

Students should have working knowledge of:

  • Python programming fundamentals
  • Basic probability and statistics concepts
  • Linear algebra fundamentals (matrices, vectors)
  • Calculus basics (derivatives)

Required libraries: NumPy, Pandas, Scikit-learn, and PyTorch.

Course Outline

🚨 DO NOT SEND IPYNB FILES FOR THE LABS!!! I WANT TO RECEIVE ONLY PY FILES!!! 🚨

Session 1: Introduction to Machine Learning - Foundations

  • Lecture: Overview of machine learning and its applications. Introduction to key types of tasks (classification, clustering, and recommendation). Fundamentals of probability in ML context. Introduction to the ML workflow with focus on data preprocessing, feature engineering, and data visualization techniques.
  • Lab: Hands-on data exploration and preprocessing on a real-world dataset. Practice data cleaning, handling missing values, feature scaling, and basic visualization techniques. Introduction to the programming framework that will be used throughout the course.
    • Lab skeleton
    • Dataset
    • Submit your Labs here before noon (midi) on Sunday, December 1. 🚨 Don’t forget to enter your name in the file AND in the file name.
  • Learning Objectives: After this session, students will be able to prepare datasets for ML algorithms, identify different types of ML problems, and understand the basic ML workflow.

Session 2: Supervised Learning - Linear Models

  • Lecture: Comprehensive coverage of linear models for both regression and classification tasks. Introduction to gradient descent optimization. Detailed examination of logistic regression and linear regression, including loss functions and decision boundaries. Introduction to model evaluation metrics and cross-validation techniques. Coverage of regularization techniques (L1/L2) and their importance.
  • Lab: Implementation of both regression and classification tasks on the same dataset to understand their differences and similarities. Practice with feature engineering and selection of appropriate evaluation metrics.
    • Lab skeleton
    • Submit your Labs here before noon (midi) on Sunday, December 8. 🚨 Don’t forget to enter your name in the file AND in the file name.
  • Learning Objectives: Students will implement linear models from scratch, understand optimization techniques, apply regularization, and evaluate model performance using appropriate metrics.

Session 3: Decision Trees & Evaluation

  • Lecture: Comprehensive coverage of decision trees, including information gain, entropy, and Gini impurity measures. Techniques for handling both categorical and numerical features. Tree pruning strategies and methods to prevent overfitting. Detailed examination of hyperparameter tuning and its impact.
  • Lab: Build and visualize decision trees, implement different splitting criteria, and practice pruning techniques. Analyze the impact of various hyperparameters on model performance.
  • Learning Objectives: Students will understand decision tree construction, implement different splitting criteria, and apply techniques to prevent overfitting.

Session 4: Ensemble Methods - Random Forests and Boosting

  • Lecture: Introduction to ensemble learning principles. Detailed coverage of bagging techniques with focus on Random Forests. Exploration of boosting techniques including Gradient Boosting and AdaBoost. Analysis of feature importance in ensemble methods.
  • Lab: Implementation of Random Forest classifiers and comparison with previously studied models. Experimentation with different ensemble techniques and analysis of their impact on model performance.
  • Learning Objectives: Students will understand ensemble methods, implement Random Forests, and know when to apply different ensemble techniques.

Session 5: Support Vector Machines

  • Lecture: Comprehensive introduction to Support Vector Machines. Detailed examination of the kernel trick and different kernel functions. Coverage of margin optimization and support vector concepts. Discussion of SVM hyperparameter tuning and practical applications.
  • Lab: Implementation and comparison of linear and non-linear SVMs. Experimentation with different kernel functions and hyperparameter optimization.
  • Learning Objectives: Students will understand SVM principles, implement different kernel functions, and optimize SVM models.

Session 6: Clustering & Dimensionality Reduction

  • Lecture: Introduction to unsupervised learning through K-means clustering. Detailed coverage of cluster validation techniques and silhouette analysis. Comprehensive examination of Principal Component Analysis (PCA) for dimensionality reduction. Integration of PCA with clustering techniques for improved performance.
  • Lab: Implementation of K-means clustering with and without PCA preprocessing. Practice with cluster validation techniques and visualization of high-dimensional data.
  • Learning Objectives: Students will implement clustering algorithms, apply dimensionality reduction techniques, and evaluate clustering quality.

Session 7: Neural Networks Basics

  • Lecture: Introduction to neural network fundamentals and the perceptron model. Coverage of key concepts including activation functions, forward propagation, and basic architectures. Discussion of learning rates, momentum, and practical training considerations.
  • Lab: Implementation of a simple neural network. Experimentation with different activation functions and basic architectures. Practice with neural network training and hyperparameter tuning.
  • Learning Objectives: Students will understand neural network basics, implement simple architectures, and train basic neural networks.

Session 8: Final Exam

  • Comprehensive assessment covering all course topics, focusing on both theoretical understanding and practical skills learned throughout the course.

Assessment Structure

The final grade will be based on two components:

  • Laboratory Work (50%)
  • Final Exam (50%)

Bibliography

  • Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python by Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili.
  • The Hundred-Page Machine Learning Book by Andriy Burkov
  • Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron
Pierre-Henri Paris
Pierre-Henri Paris
Associate Professor in Artificial Intelligence

My research interests include Knowlegde Graphs, Information Extraction, and NLP.