Introduction to Machine Learning

Foundations

Summary

What is Machine Learning?
Types of Machine Learning Tasks
Numpy and Pandas
Fundamentals of Probability in ML
Workflow and Data Preprocessing Techniques
Feature Engineering
Data Visualization Techniques
Conclusion
Glossary

What is Machine Learning?

Machine Learning (ML) is a subset of artificial intelligence that focuses on enabling computers to learn from data and improve over time without explicit programming.

Learning from data
Making predictions or decisions
Improving performance through experience

Why Machine Learning?

Motivation: Handling large volumes of data, automating model building, uncovering hidden patterns

Real-world Applications

Healthcare: Predicting patient outcomes, disease prediction

Finance: Algorithmic trading, fraud detection

E-commerce: Personalized recommendations

Social Media: Content curation, recommendations

Historical Context and Evolution of ML

Understanding where machine learning fits within the evolution of computing and AI:

1950s: Alan Turing proposes the idea of machines that learn (Turing Test).
1960s-1980s: Early rule-based systems and symbolic AI.
1980s-1990s: Rise of statistical approaches (e.g., decision trees, SVMs).
2000s: Explosion of data (Big Data) and computational power enable larger models.
2010s: Deep learning revolutionizes fields like image recognition and NLP.
Today: ML powers applications in almost every industry, from healthcare to finance.

Machine learning has evolved from basic algorithms to sophisticated models shaping modern technology.

Ethical and Social Considerations in ML

Machine learning can have profound social impacts. Key considerations include:

Fairness: Ensuring ML models do not discriminate against certain groups.
Bias: Recognizing and mitigating biases in training data and algorithms.
Privacy: Protecting user data when training and deploying ML models.
Transparency: Making models interpretable and decisions explainable.
Accountability: Determining who is responsible for the outcomes of ML systems.

These issues are essential for deploying ML responsibly and building trust in AI systems.

Limitations of Machine Learning

While powerful, ML has inherent challenges:

Data Dependency: ML models require high-quality, large-scale data.
Interpretability: Complex models (e.g., deep learning) can be hard to understand.
Overfitting: Models may perform well on training data but fail to generalize.
Resource Intensive: Training large models can be computationally and energy expensive.
Limited Generalization: ML struggles with tasks outside its training data (e.g., edge cases).

Recognizing these limitations is crucial for effectively using ML in real-world applications.

Types of Machine Learning Tasks

Overview of ML Tasks

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

Learning from labeled data to make predictions.

Types:
- Classification: Predict categorical outcomes
- Regression: Predict continuous outcomes
Examples: Spam detection, stock price prediction

Unsupervised Learning

Discovering patterns in unlabeled data.

Types:
- Clustering
- Dimensionality Reduction
Examples: Customer segmentation, anomaly detection

Reinforcement Learning

Learning by interacting with an environment to maximize cumulative rewards.

Core Elements:
- Agent: The decision-maker
- Actions: Choices made by the agent
- Rewards: Feedback for actions
Examples: Game playing (e.g., AlphaGo), robotics, self-driving cars

Classification

Assigning inputs to predefined categories.

Use Cases: Email spam vs. not spam, image recognition (e.g., cats vs. dogs)

Regression

Predicting a continuous numerical value.

Use Cases: House price prediction, forecasting sales

Clustering

Grouping similar data points without predefined labels.

Use Cases: Market segmentation, document classification

Recommendation Systems

Predicting user preferences to recommend items.

Types:
- Collaborative Filtering
- Content-based Filtering
Examples:

Numpy and Pandas

Introduction to NumPy

NumPy (Numerical Python): The foundation for machine learning in Python

Core data structure: ndarray (N-dimensional array)
Essential features for ML:

Matrix operations for neural networks
Efficient numerical computations
Statistical functions for data preprocessing
Random sampling for train/test splits
Linear algebra for feature transformations

Integration with major ML libraries (scikit-learn, TensorFlow, PyTorch)

ndarray: The Building Block of ML

Why crucial for ML:

Efficient storage of large datasets
Fast matrix operations for model training
Memory-efficient data types for large-scale ML

import numpy as np
# Create feature matrix and labels
X = np.array(
    [[1, 2, 3], # sample 1
     [4, 5, 6], # sample 2
     [7, 8, 9]] # sample 3
)  # Feature matrix (3 samples, 3 features -> 9 elements)
y = np.array([0, 1, 1])  # Labels
# Convert types (common in ML preprocessing)
X = X.astype(float)  # Convert to float for ML algorithms
# [[1. 2. 3.]
#  [4. 5. 6.]
#  [7. 8. 9.]]

⚠️ All the code examples of this course can be found here.

Essential NumPy Operations for ML

Matrix operations for neural networks
Statistical operations for feature scaling
Shape manipulation for batch processing

import numpy as np
# Matrix multiplication (common in neural networks)
weights = np.random.randn(3, 2)
layer_output = np.dot(X, weights)
# Feature scaling
X_normalized = (X - X.mean(axis=0)) / X.std(axis=0)
# Reshape for mini-batches
batch_size = 2
X_batches = X.reshape(-1, batch_size, X.shape[1])

NumPy Universal Functions for ML

Essential operations for model implementation:

Activation functions: np.exp() for softmax
Loss calculations: np.log() for cross-entropy
Metrics: np.mean(), np.sum()

import numpy as np
# Softmax activation
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)
# Binary cross-entropy calculation
def binary_cross_entropy(y_true, y_pred):
    return -np.mean(y_true * np.log(y_pred) + 
                   (1 - y_true) * np.log(1 - y_pred))

Introduction to Pandas for ML

Pandas: Essential for ML data preprocessing and feature engineering

Key ML applications:

Loading and cleaning datasets
Feature engineering and selection
Handling missing values
Categorical variable encoding

DataFrames: ML Data Preparation

import pandas as pd
import numpy as np
# Load and prepare ML dataset
df = pd.DataFrame({
    'feature1': [1, 2, np.nan, 4],
    'feature2': ['A', 'B', 'A', 'C'],
    'target': [0, 1, 1, 0]
})
# Handle missing values
df['feature1'].fillna(df['feature1'].mean(), inplace=True)
# Encode categorical variables
df_encoded = pd.get_dummies(df, columns=['feature2'])

ML-Specific Pandas Operations

Feature engineering techniques:

Creating interaction features
Time-based feature extraction
Statistical aggregations

import pandas as pd
# Feature engineering examples
df['interaction'] = df['feature1'] * df['feature2']
# Stratified sampling for train/test split
train_df = df.sample(frac=0.8, stratify=df['target'])
test_df = df.drop(train_df.index)
# Statistical features
df['rolling_mean'] = df['feature1'].rolling(window=3).mean()

From Pandas to NumPy for ML

Converting preprocessed data to ML-ready format
Splitting features and targets
Final preprocessing steps

import pandas as pd
import numpy as np
# Convert DataFrame to NumPy arrays
X = df_encoded.drop('target', axis=1).to_numpy()
y = df_encoded['target'].to_numpy()
# Final scaling for ML
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Ready for ML algorithms
print("Feature matrix shape:", X_scaled.shape)
print("Target vector shape:", y.shape)

Selecting Specific Data Types

Use select_dtypes to filter columns by data type:

import pandas as pd
# Create a sample DataFrame
data = pd.DataFrame({
    'numerical': [1, 2, 3],
    'categorical': ['A', 'B', 'C'],
    'boolean': [True, False, True]
})
# Select only numerical columns
numerical_data = data.select_dtypes(include=['number'])
# Select only categorical columns
categorical_data = data.select_dtypes(include=['object'])
print("Numerical Columns:\n", numerical_data)
print("Categorical Columns:\n", categorical_data)

Why? Useful for applying operations to specific types of data (e.g., scaling numerical features).

Summary Statistics and Absolute Values

median(): Compute the median (middle value).
std(): Calculate standard deviation (measure of spread).
np.abs(): Compute absolute values of numerical data.

import pandas as pd
import numpy as np
# Sample data
data = pd.DataFrame({'values': [-10, 20, -30, 40, -50]})
# Median
median_value = data['values'].median()
# Standard deviation
std_value = data['values'].std()
# Absolute values
absolute_values = np.abs(data['values'])
print("Median:", median_value)
print("Standard Deviation:", std_value)
print("Absolute Values:\n", absolute_values)

Why? These functions are essential for understanding data distributions and normalizing values.

Deep Copy vs. Shallow Copy

Use df.copy() to create a true (deep) copy of a DataFrame:

import pandas as pd
# Sample DataFrame
data = pd.DataFrame({'values': [1, 2, 3]})
# Shallow copy (linked to original)
shallow_copy = data
# Deep copy (independent of original)
deep_copy = data.copy()
# Modify original
data.loc[0, 'values'] = 999
print("Original:\n", data)
print("Shallow Copy:\n", shallow_copy)  # Changes with original
print("Deep Copy:\n", deep_copy)        # Stays unchanged

Why? Use df.copy() to avoid unintended modifications to the original DataFrame.

Combining DataFrames

Use pd.concat to combine DataFrames vertically or horizontally:

import pandas as pd
# Sample DataFrames
data1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
data2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
# Concatenate vertically (default)
vertical_concat = pd.concat([data1, data2])
# Concatenate horizontally
horizontal_concat = pd.concat([data1, data2], axis=1)
print("Vertical Concatenation:\n", vertical_concat)
print("Horizontal Concatenation:\n", horizontal_concat)

Why? pd.concat is ideal for combining datasets during preprocessing.

Scikit-learn Basics

Preprocessing Tools: Imputation, scaling, encoding.
Algorithms: Classification, regression, clustering.
Model Evaluation: Cross-validation, evaluation metrics.

Fundamentals of Probability in ML

Importance of Probability
in ML

Probabilities quantify the uncertainty in predictions.

Understanding uncertainty
Working with probability distributions
Calculating conditional probabilities

Random Variables
and Distributions

A random variable is a variable whose value is subject to variations due to chance.

Discrete: Binomial distribution
Continuous: Normal distribution

Probability mass function for the binomial distribution

Expected Value, Variance, and Standard Deviation

Expected Value (Mean): Average outcome of a random variable.
Variance: Measure of how much values differ from the mean.
Standard Deviation:
- Square root of variance.
- Indicates the spread of data around the mean.
- Useful for understanding the uncertainty in probability distributions.

import numpy as np
# Sample random variable data
random_variable = np.array([1, 2, 3, 4, 5])
# Expected Value (Mean)
expected_value = np.mean(random_variable)
# Variance
variance = np.var(random_variable)
# Standard Deviation
std_dev = np.std(random_variable)
print(f"Expected Value (Mean): {expected_value}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")

Why is it important? In probability distributions, standard deviation quantifies the uncertainty:

Low standard deviation: Data points are close to the mean (narrow spread).
High standard deviation: Data points are widely spread (greater uncertainty).

Conditional Probability

The probability of an event A given that event B has occurred.

Formula: \( P(A|B) = \frac{P(A \cap B)}{P(B)} \)
Key Concepts:
- Joint Probability: \( P(A \cap B) \): The likelihood of A and B happening together.
- Marginal Probability: \( P(B) \): The likelihood of event B happening.
Application in ML:
- Naive Bayes classifier
- Bayesian networks
- Predictive models with probabilistic outputs
Assumptions in ML:
- Naive Bayes assumes conditional independence among features.
- Bayesian networks capture conditional dependencies.

# Example of Conditional Probability
# P(A|B) = P(A and B) / P(B)
p_a_and_b = 0.3
p_b = 0.6
p_a_given_b = p_a_and_b / p_b
print(f"P(A|B): {p_a_given_b}")

Joint and Marginal Probabilities

Joint Probability: The probability of two events occurring simultaneously.
- \( P(A \cap B) \): Probability of both A and B.
- Example: Probability of rain and carrying an umbrella.
Marginal Probability: The probability of a single event occurring, irrespective of others.
- \( P(B) \): Probability of B happening.
- Example: Probability of rain regardless of carrying an umbrella.

Bayes' Theorem

Formula: \( P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \)
Key Terms:
- \( P(A) \): Prior probability (before observing B).
- \( P(B|A) \): Likelihood of B given A.
- \( P(A|B) \): Posterior probability (updated belief after observing B).
Use Cases:
- Updating beliefs with new evidence
- Spam filtering
- Medical diagnosis
Intuition: Think of Bayes' theorem like updating your belief about the weather (event A) after looking at the sky (evidence B).

# Bayes' Theorem Example
def bayes_theorem(p_a, p_b_given_a, p_b):
    return (p_b_given_a * p_a) / p_b
# Inputs
p_a = 0.2  # Prior probability of A
p_b_given_a = 0.8  # Likelihood of B given A
p_b = 0.5  # Marginal probability of B
# Posterior
p_a_given_b = bayes_theorem(p_a, p_b_given_a, p_b)
print(f"P(A|B): {p_a_given_b}")

Naive Bayes Assumptions

While Naive Bayes is simple and effective, it relies on certain assumptions:

Conditional Independence:
- Assumes features are conditionally independent given the target label.
- Rarely holds true in real-world datasets (e.g., word dependencies in text).
Class Prior Accuracy:
- Depends on accurate prior probabilities (\( P(A) \)) for each class.
- Biased or imbalanced data can lead to poor performance.
Sensitivity to Feature Representation:
- Performance depends on appropriate feature engineering.
- Examples: Word frequencies in text, categorical encoding in structured data.
Key Insight:
- Naive Bayes performs surprisingly well for high-dimensional data like text classification, even when the independence assumption does not hold, thanks to averaging effects across features.

Entropy and Information Gain

Entropy: A measure of uncertainty or randomness.
- Formula: \( H(X) = -\sum P(x) \log_2 P(x) \)
- Example: High entropy in a coin flip (50-50), low entropy in biased coin (90-10).
Information Gain: Reduction in entropy after splitting the data.
Use:
- Decision Trees:
  - Entropy measures the uncertainty in a dataset.
  - Information gain guides feature selection for splits.
- Clustering:
  - Entropy measures the purity of clusters (e.g., in k-means).
  - Helps evaluate cluster quality during initialization or refinement.
- Uncertainty in Predictions:
  - Entropy measures model confidence in classification tasks.
  - Used in probabilistic outputs like softmax in neural networks.

Confidence Intervals

A confidence interval quantifies the range within which a parameter lies with a certain probability.

Helps understand prediction reliability.
Widely used in regression and probabilistic models.

# Confidence Interval Example
import scipy.stats as stats
data = [1, 2, 3, 4, 5]
mean = np.mean(data)
conf_interval = stats.norm.interval(0.95, loc=mean, scale=np.std(data)/np.sqrt(len(data)))
print(f"95% Confidence Interval: {conf_interval}")

Key Mathematical Foundations for ML

Revisiting these topics will help you better understand machine learning concepts and algorithms:

Linear Algebra:
- Vectors, matrices, and matrix operations
- Eigenvalues and eigenvectors
- Applications in dimensionality reduction (e.g., PCA)
Calculus:
- Derivatives and gradients
- Optimization techniques (e.g., gradient descent)
- Applications in neural networks and backpropagation
Probability and Statistics:
- Probability distributions (normal, binomial)
- Conditional probability and Bayes' theorem
- Applications in probabilistic models (e.g., Naive Bayes)

Consider reviewing these areas if they feel unfamiliar. They are integral to ML concepts and algorithms!

Key Takeaways

Probability quantifies uncertainty and is central to ML predictions.
Random variables and distributions underpin probabilistic models.
Bayes' theorem updates beliefs and powers algorithms like Naive Bayes.
Entropy measures uncertainty; information gain drives decision tree splits.

Workflow and Data Preprocessing Techniques

ML workflow diagram

Data Collection

Sources:
- Databases
- APIs
- Web scraping
Considerations:
- Data quality
- Volume and variety
- Legal and ethical issues

Data Preprocessing

Ensuring data quality is critical to model performance.

Data cleaning
Preparation for analysis
Improving accuracy

Data Cleaning

Tasks:
- Handling missing values
- Removing duplicates
- Correcting errors
Tools: Pandas functions like dropna(), fillna(), duplicated()

Handling Missing Values

Identify Missing Data: Use isnull() and sum() in Pandas.
Strategies:
- Deletion: Listwise (drop rows), Pairwise (drop specific values)
- Imputation: Mean/Median/Mode replacement, Forward/Backward fill, Interpolation

Missing Values - Code Example

# Identify missing values
missing_values = data.isnull().sum()
# Drop rows with missing values
data_clean = data.dropna()
# Impute missing values with mean
data['column'] = data['column'].fillna(data['column'].mean())

Removing Duplicates

Why? Duplicates can skew analysis and inflate model performance.
Tools: Use duplicated() and drop_duplicates() in Pandas.

# Identify duplicates
duplicates = data.duplicated()
# Remove duplicates
data_clean = data.drop_duplicates()

Feature Scaling

Scaling ensures features contribute equally to the model.

Normalization (Min-Max Scaling): Rescales features to [0, 1].
- Common in initialization (e.g., weights in neural networks).
Standardization (Z-score Scaling): Centers features around mean 0 with standard deviation 1.
- Useful in classification tasks or modeling binary outcomes.

Feature Scaling - Code Example

from sklearn.preprocessing import MinMaxScaler, StandardScaler
# Normalization
scaler = MinMaxScaler()
data_normalized = scaler.fit_transform(data)
# Standardization
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)

Encoding Categorical Variables

Categorical data must be converted to numerical format for machine learning algorithms.

Label Encoding: Assigns a unique number to each category.
One-Hot Encoding: Creates binary columns for each category.

Encoding - Code Example

# One-Hot Encoding with Pandas
data_encoded = pd.get_dummies(data, columns=['categorical_column'])
# Label Encoding with Scikit-learn
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data['category_encoded'] = le.fit_transform(data['categorical_column'])

Algorithm cheat sheet

Feature Engineering

What is Feature Engineering?

Feature engineering involves creating, transforming, or selecting features to improve model performance.

Improves model accuracy
Reduces complexity

Feature Engineering Techniques

Feature Creation: Creating new features based on domain knowledge
(e.g., total_price = quantity * unit_price).
Feature Transformation: Applying transformations to handle skewed data (e.g., log transformation).
Feature Selection: Removing irrelevant or redundant features.

Feature Engineering - Examples

Datetime Features: Extract day, month, year, or weekday from timestamps.
Text Data: Convert text to numerical vectors using techniques like TF-IDF.

Data Visualization Techniques

Importance of Data Visualization

Data visualization helps to understand data distributions, detect patterns, and spot anomalies.

Understand data distributions
Identify patterns and trends
Detect outliers

Common Visualization Plots

Histogram: Shows frequency distribution of a variable.
Scatter Plot: Visualizes the relationship between two numerical variables.
Box Plot: Displays summary statistics and outliers. Also named whisker plot.

Visualization Libraries

Matplotlib: Basic plotting library.
Seaborn: Built on Matplotlib with enhanced features for complex visualizations.

Histogram - Code Example

# Histogram using Matplotlib
import matplotlib.pyplot as plt
plt.hist(data['numerical_column'], bins=30)
plt.title('Histogram of Numerical Column')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Scatter Plot - Code Example

# Scatter Plot using Seaborn
import seaborn as sns
sns.scatterplot(x='feature1', y='feature2', data=data)
plt.title('Feature1 vs. Feature2')
plt.show()

Box Plot - Code Example

# Box Plot using Seaborn
sns.boxplot(x='categorical_column', y='numerical_column', data=data)
plt.title('Numerical Column by Category')
plt.show()

Conclusion

Key Takeaways

Machine Learning: Core principles include learning from data, making predictions, and improving with experience.
Types of ML Tasks: Supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), and reinforcement learning.
Tools: Mastery of libraries like NumPy, Pandas, and Scikit-learn is critical for data manipulation, preprocessing, and model building.
Probability in ML: Probability fundamentals are essential for understanding uncertainty, distributions, and algorithms like Naive Bayes.
Data Preprocessing: Techniques such as cleaning, scaling, encoding, and feature engineering significantly impact model performance.
Visualization: Effective data visualization with libraries like Matplotlib and Seaborn aids in understanding data patterns and distributions.
Workflow: A structured ML workflow—from problem definition to deployment—ensures efficiency and scalability.
Real-world Applications: ML impacts diverse domains such as healthcare, finance, e-commerce, and transportation.

Resources and Further Reading

Books:
- Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka et al.
- The Hundred-Page Machine Learning Book by Andriy Burkov
Online Tutorials: Pandas documentation, Scikit-learn tutorials
Documentation: Official library documentation

Glossary

General Concepts

Machine Learning (ML): A branch of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.
Artificial Intelligence (AI): The simulation of human intelligence in machines that are programmed to think and learn.
Deep Learning: A subset of ML that uses neural networks with many layers (deep neural networks) to model and solve complex problems.
Supervised Learning: A type of ML where the model is trained on labeled data to predict outputs for new inputs.
Unsupervised Learning: A type of ML that deals with unlabeled data to identify hidden patterns or structures.
Reinforcement Learning: A learning paradigm where an agent learns by interacting with its environment and receiving feedback in the form of rewards or penalties.
Model: A mathematical representation or algorithm trained on data to make predictions or decisions.
Algorithm: A set of rules or processes followed in problem-solving or computations, such as gradient descent or decision trees.
Standard Deviation: A measure of the spread of data around the mean, calculated as the square root of the variance. It quantifies uncertainty in probability distributions and is widely used in data analysis and ML.
Bias: Systematic error that skews results in one direction, often due to flawed assumptions.
Variance: The variability of model predictions for different datasets, contributing to overfitting if too high.

Data and Features

Dataset: A collection of data used to train and evaluate ML models.
Sample: A single data point or instance from a dataset used for analysis, training, or evaluation of a machine learning model. Samples collectively form the dataset.
Feature: An individual measurable property or characteristic of a data point used as input to an ML model.
Label: The output variable in supervised learning that the model tries to predict.
Feature Engineering: The process of selecting, transforming, and creating features from raw data to improve model performance.
Feature Scaling: Techniques to standardize the range of features, such as normalization or standardization.
Training Set: A subset of the dataset used to train the model.
Test Set: A subset of the dataset used to evaluate the model's performance.
Validation Set: A subset of the dataset used during training to tune model parameters and prevent overfitting.
Data Augmentation: Techniques to increase the size of a dataset by generating new data points based on existing data.
Outlier: A data point that significantly deviates from the rest of the dataset, potentially affecting analysis and model performance.

Model Evaluation

Accuracy: The ratio of correctly predicted instances to the total instances in the dataset.
Precision: The ratio of true positives to the sum of true positives and false positives.
Recall: The ratio of true positives to the sum of true positives and false negatives.
F1 Score: The harmonic mean of precision and recall.
Confusion Matrix: A table used to evaluate the performance of a classification algorithm, showing true positives, true negatives, false positives, and false negatives.
ROC Curve: A graphical representation of a model's performance across different thresholds.
AUC (Area Under the Curve): The area under the ROC curve, representing the model's ability to distinguish between classes.
Cross-Validation: A technique to assess the model's performance by splitting the data into training and testing sets multiple times.

Optimization and Training

Gradient Descent: An optimization algorithm used to minimize the loss function by updating model parameters iteratively.
Gradient Vanishing: A problem where gradients become too small during backpropagation, slowing or stopping learning in deep networks.
Gradient Exploding: A problem where gradients become excessively large, leading to unstable training.
Loss Function: A function that measures the difference between the predicted outputs and the true labels.
Learning Rate: A hyperparameter that determines the step size in the gradient descent algorithm.
Overfitting: When a model learns the training data too well, including noise, leading to poor generalization.
Underfitting: When a model is too simple and fails to capture the underlying patterns in the data.
Regularization: Techniques like L1 or L2 to prevent overfitting by adding a penalty to the loss function.
Epoch: One complete pass through the entire training dataset.
Batch Size: The number of samples processed before the model's internal parameters are updated.
Early Stopping: A method to stop training when the performance on the validation set stops improving.

Algorithms and Models

Linear Regression: A supervised learning algorithm for predicting continuous outputs by fitting a linear relationship between input and output.
Logistic Regression: A supervised learning algorithm for binary classification problems.
Decision Tree: A tree-like model used for classification or regression tasks.
Random Forest: An ensemble method using multiple decision trees to improve performance and reduce overfitting.
Support Vector Machine (SVM): A supervised learning algorithm that separates data into classes using a hyperplane.
K-Nearest Neighbors (KNN): A simple algorithm that classifies data points based on the majority class of their k-nearest neighbors.
K-Means Clustering: An unsupervised learning algorithm that partitions data into k clusters.
Principal Component Analysis (PCA): A dimensionality reduction technique that transforms data into a lower-dimensional space.
Neural Network: A set of algorithms modeled after the human brain, consisting of layers of interconnected nodes (neurons).
Gaussian Distribution (Normal Distribution): A common probability distribution with a bell-shaped curve, characterized by its mean and standard deviation.

Deep Learning Specific Terms

Activation Function: Functions like ReLU or sigmoid that determine the output of a neuron in a neural network.
Backpropagation: A method for training neural networks by calculating the gradient of the loss function with respect to weights.
Convolutional Neural Network (CNN): A type of neural network designed for image data.
Recurrent Neural Network (RNN): A type of neural network designed for sequential data like time series or text.
Dropout: A regularization technique where randomly selected neurons are ignored during training.
Batch Normalization: A technique to stabilize and accelerate the training of deep neural networks.

Advanced Topics

Transfer Learning: A technique where a pre-trained model is adapted to a new but similar task.
Ensemble Learning: Combining multiple models to improve overall performance.
Bayesian Networks: Probabilistic graphical models representing variables and their dependencies.
Markov Decision Process (MDP): A mathematical framework for modeling decision-making in environments with stochastic outcomes.
Autoencoder: A neural network used to learn efficient representations of data, typically for dimensionality reduction.
Generative Adversarial Network (GAN): A framework where two networks (generator and discriminator) compete to improve each other's performance.
Attention Mechanism: A technique in neural networks that focuses on the most relevant parts of the input.

Practical Terms

Hyperparameter Tuning: The process of finding the optimal settings for a model's hyperparameters.
Pipeline: A sequence of data preprocessing and model training steps.
Exploratory Data Analysis (EDA): The process of analyzing datasets to summarize their main characteristics.
Reproducibility: The ability to consistently reproduce the same results using the same methodology and data.
Explainability: Techniques and methods to make ML model predictions interpretable and understandable.
Data Imputation: Techniques for replacing missing values with estimates like the mean, median, or predicted values.

To get the PDF of these slides and print them, click here and then use the PDF printer of your browser.

Introduction to Machine Learning

Foundations

Summary

What is Machine Learning?

What is Machine Learning?

Why Machine Learning?

Real-world Applications

Historical Context and Evolution of ML

Ethical and Social Considerations in ML

Limitations of Machine Learning

Types of Machine Learning Tasks

Overview of ML Tasks

Supervised Learning

Unsupervised Learning

Reinforcement Learning

Classification

Regression

Clustering

Recommendation Systems

Numpy and Pandas

Introduction to NumPy

ndarray: The Building Block of ML

Essential NumPy Operations for ML

NumPy Universal Functions for ML

Introduction to Pandas for ML

DataFrames: ML Data Preparation

ML-Specific Pandas Operations

From Pandas to NumPy for ML

Selecting Specific Data Types

Summary Statistics and Absolute Values

Deep Copy vs. Shallow Copy

Combining DataFrames

Scikit-learn Basics

Fundamentals of Probability in ML

Importance of Probabilityin ML

Random Variablesand Distributions

Expected Value, Variance, and Standard Deviation

Conditional Probability

Joint and Marginal Probabilities

Bayes' Theorem

Naive Bayes Assumptions

Entropy and Information Gain

Confidence Intervals

Key Mathematical Foundations for ML

Key Takeaways

Workflow and Data Preprocessing Techniques

ML workflow diagram

Data Collection

Data Preprocessing

Data Cleaning

Handling Missing Values

Missing Values - Code Example

Removing Duplicates

Feature Scaling

Feature Scaling - Code Example

Encoding Categorical Variables

Encoding - Code Example

Algorithm cheat sheet

Feature Engineering

What is Feature Engineering?

Feature Engineering Techniques

Feature Engineering - Examples

Data Visualization Techniques

Importance of Data Visualization

Common Visualization Plots

Visualization Libraries

Histogram - Code Example

Scatter Plot - Code Example

Box Plot - Code Example

Conclusion

Key Takeaways

Resources and Further Reading

Glossary

General Concepts

Data and Features

Model Evaluation

Optimization and Training

Algorithms and Models

Deep Learning Specific Terms

Advanced Topics

Importance of Probability
in ML

Random Variables
and Distributions