Exercises: Introduction to Neural Networks
Exercise 1: Forward Pass Calculation
Objective: Understand the propagation of inputs through a neural network.
Given:
- Input (single example):
- Weights:
- Biases:
- Input (single example):
Tasks:
- Calculate the weighted sum
- Apply the ReLU activation function component-wise:
- Calculate the weighted sum
Exercise 2: Backpropagation
Objective: Compute gradients for a simple neural network.
Setup (two-layer network):
- Input:
- First-layer weights:
- First-layer biases (for simplicity):
- Hidden representation:
- Output-layer weights (binary output):
- Output bias:
- Output pre-activation and prediction:
- True label:
- Input:
Tasks:
- Perform the forward pass as defined above (ReLU for the hidden layer, Sigmoid for the output).
- Calculate the binary cross-entropy loss:
- Derive the gradients
and using backpropagation (apply the chain rule).
Exercise 3: Data Preprocessing
Objective: Explore the impact of scaling on neural networks.
Given:
- Dataset
, with rows as samples and columns as features:
- Dataset
Tasks:
- Apply Min–Max scaling per feature to rescale each column to the range
: - Standardize each feature to have zero mean and unit variance:
- Compare the two approaches and explain when each would be preferred.
- Apply Min–Max scaling per feature to rescale each column to the range
Exercise 4: Activation Functions
Objective: Compare the behavior of activation functions.
Given:
- Input values
.
- Input values
Tasks:
For each
, compute the outputs of:- ReLU:
- Leaky ReLU
: - Sigmoid:
- Tanh:
- ReLU:
Sketch the graphs of these functions.
Discuss the advantages and disadvantages of each function.
Exercise 5: Gradient Checking
Objective: Verify the correctness of computed gradients.
Setup:
- Prediction:
- Loss:
- Parameters:
, , , .
- Prediction:
Tasks:
- Compute the analytical gradient
. - Use numerical approximation to compute:
- Compare the two results and explain any differences.
- Compute the analytical gradient
Exercise 6: Regularization
Objective: Understand the effect of L2 regularization on weight updates.
Setup:
- Weights (parameter vector):
- L2 regularization with coefficient
.
- Weights (parameter vector):
Tasks:
- Compute the weight penalty term:
- Let
denote the gradient of the loss without regularization. Derive the gradient descent update rule for with learning rate when L2 regularization is added, i.e. express in terms of , , , and . - Explain qualitatively how this regularization term affects model training (in particular, the magnitude of the weights).
- Compute the weight penalty term:
Exercise 7: Neural Network Error Analysis
Objective: Analyze and identify potential issues in a neural network setup.
Setup:
- Input:
- First layer:
Hidden pre-activation and activation: - Output layer:
Output pre-activation and prediction: - True label:
.
- Input:
Tasks:
- Calculate the loss using binary cross-entropy:
- Based on the scale of the weights and the network depth, discuss whether this initialization is likely to cause vanishing or exploding gradients.
- Propose changes to the network architecture, initialization scheme, or other hyperparameters to improve training stability and performance.
- Calculate the loss using binary cross-entropy: