Exercises: Introduction to Neural Networks

Exercise 1: Forward Pass Calculation

Objective: Understand the propagation of inputs through a neural network.

  1. Given:

    • Input (single example): XR1×2,X=[1,0.5]
    • Weights: WR2×2,W=[0.20.8 0.40.3]
    • Biases: bR1×2,b=[0.1,0.1]
  2. Tasks:

    • Calculate the weighted sum Z=XW+b.
    • Apply the ReLU activation function component-wise: A=ReLU(Z).

Exercise 2: Backpropagation

Objective: Compute gradients for a simple neural network.

  1. Setup (two-layer network):

    • Input: XR1×2,X=[0.5,0.2]
    • First-layer weights: W1R2×2,W1=[0.10.3 0.20.4]
    • First-layer biases (for simplicity): b1=[0,0]
    • Hidden representation: Z1=XW1+b1,H=ReLU(Z1)
    • Output-layer weights (binary output): W2R2×1,W2=[0.2 0.5]
    • Output bias: b2=0
    • Output pre-activation and prediction: z2=HW2+b2,y^=σ(z2)
    • True label: y=1
  2. Tasks:

    • Perform the forward pass as defined above (ReLU for the hidden layer, Sigmoid for the output).
    • Calculate the binary cross-entropy loss: L=(ylog(y^)+(1y)log(1y^)).
    • Derive the gradients LW1 and LW2 using backpropagation (apply the chain rule).

Exercise 3: Data Preprocessing

Objective: Explore the impact of scaling on neural networks.

  1. Given:

    • Dataset XR3×3, with rows as samples and columns as features: X=[52010 15525 103015].
  2. Tasks:

    • Apply Min–Max scaling per feature to rescale each column to the range [0,1]: xnew=xxminxmaxxmin.
    • Standardize each feature to have zero mean and unit variance: xnew=xμσ.
    • Compare the two approaches and explain when each would be preferred.

Exercise 4: Activation Functions

Objective: Compare the behavior of activation functions.

  1. Given:

    • Input values X=2,1,0,1,2.
  2. Tasks:

    • For each xX, compute the outputs of:

      • ReLU: f(x)=max(0,x)
      • Leaky ReLU (α=0.01): f(x)=max(0.01x,x)
      • Sigmoid: σ(x)=11+ex
      • Tanh: tanh(x)=exexex+ex
    • Sketch the graphs of these functions.

    • Discuss the advantages and disadvantages of each function.

Exercise 5: Gradient Checking

Objective: Verify the correctness of computed gradients.

  1. Setup:

    • Prediction: y^=wx+b
    • Loss: L=12(yy^)2
    • Parameters: w=0.5, x=2, b=0.1, y=1.
  2. Tasks:

    • Compute the analytical gradient Lw.
    • Use numerical approximation to compute: LwL(w+ε)L(wε)2εwith ε=104.
    • Compare the two results and explain any differences.

Exercise 6: Regularization

Objective: Understand the effect of L2 regularization on weight updates.

  1. Setup:

    • Weights (parameter vector): W=[1,2,0.5]R3.
    • L2 regularization with coefficient λ=0.01.
  2. Tasks:

    • Compute the weight penalty term: λjWj2=λ|W|22.
    • Let g=LW denote the gradient of the loss without regularization. Derive the gradient descent update rule for W with learning rate η=0.1 when L2 regularization is added, i.e. express Wnew in terms of W, g, λ, and η.
    • Explain qualitatively how this regularization term affects model training (in particular, the magnitude of the weights).

Exercise 7: Neural Network Error Analysis

Objective: Analyze and identify potential issues in a neural network setup.

  1. Setup:

    • Input: XR1×2,X=[1,2]
    • First layer: W1=[0.50.2 0.30.8],b1=[0.1,0.1] Hidden pre-activation and activation: Z1=XW1+b1,H=ReLU(Z1)
    • Output layer: W2=[0.7\-0.6],b2=0.2 Output pre-activation and prediction: z2=HW2+b2,y^=σ(z2)
    • True label: y=1.
  2. Tasks:

    • Calculate the loss using binary cross-entropy: L=(ylog(y^)+(1y)log(1y^)).
    • Based on the scale of the weights and the network depth, discuss whether this initialization is likely to cause vanishing or exploding gradients.
    • Propose changes to the network architecture, initialization scheme, or other hyperparameters to improve training stability and performance.
Pierre-Henri Paris
Pierre-Henri Paris
Associate Professor in Artificial Intelligence

My research interests include Knowlegde Graphs, Information Extraction, and NLP.