Neural Networks from Scratch

Building a neural network from scratch is a foundational exercise in this bootcamp. Before transitioning to high-level frameworks like PyTorch (as seen in the Baseline Model), it is crucial to understand the underlying mathematics—linear algebra, calculus, and optimization—that enable deep learning.

This section covers the implementation of a Multi-Layer Perceptron (MLP) using only Python and NumPy.

Core Concepts

In a "from scratch" implementation, we decompose the neural network into its primary mathematical operations:

Linear Transformation: $z = Wx + b$
Activation Function: $a = \sigma(z)$
Loss Function: $L(y, \hat{y})$
Backpropagation: $\frac{\partial L}{\partial W}, \frac{\partial L}{\partial b}$

Mathematical Implementation

1. The Forward Pass

Each layer in a neural network performs a weighted sum of its inputs followed by a non-linear activation. For a single layer, the operation is:

$$ a^{[l]} = \sigma(W^{[l]} a^{[l-1]} + b^{[l]}) $$

Where:

$W$: Weight matrix
$b$: Bias vector
$\sigma$: Activation function (e.g., Sigmoid, ReLU, or ELU as used in the MyNet baseline)

2. Loss Functions

To train the network, we need to quantify the error. For binary classification tasks (common in Gravitational Wave glitch detection), we typically use Binary Cross-Entropy (BCE):

$$ J = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})] $$

3. Backpropagation (The Chain Rule)

To update the weights, we calculate the gradient of the loss function with respect to each parameter. This is achieved by applying the chain rule backward from the output layer to the input layer.

$$ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial W} $$

Implementation Example: A Simple Layer

Below is a conceptual implementation of a single dense layer using NumPy. This mirrors the logic handled internally by the nn.Conv2d and nn.Linear layers in the baseline/main.py script.

import numpy as np

class NeuralLayer:
    def __init__(self, input_size, output_size):
        # He Initialization for weights
        self.W = np.random.randn(output_size, input_size) * np.sqrt(2/input_size)
        self.b = np.zeros((output_size, 1))
        
    def forward(self, A_prev):
        """
        Calculates the linear part of the forward pass.
        """
        self.A_prev = A_prev
        self.Z = np.dot(self.W, A_prev) + self.b
        return self.Z

    def backward(self, dZ, learning_rate):
        """
        Updates parameters using gradients calculated via the chain rule.
        """
        m = self.A_prev.shape[1]
        dW = (1/m) * np.dot(dZ, self.A_prev.T)
        db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
        
        # Gradient Descent Update
        self.W -= learning_rate * dW
        self.b -= learning_rate * db
        
        # Gradient to pass to the previous layer
        dA_prev = np.dot(self.W.T, dZ)
        return dA_prev

Activation Functions

Activation functions introduce non-linearity, allowing the network to learn complex patterns in Gravitational Wave strains.

Sigmoid: Useful for output layers in binary classification.
- $f(x) = \frac{1}{1 + e^{-x}}$
ReLU (Rectified Linear Unit): The standard for hidden layers to avoid vanishing gradients.
- $f(x) = \max(0, x)$
ELU (Exponential Linear Unit): Used in the Bootcamp Baseline Model for its ability to handle negative values more gracefully than ReLU.
- $f(x) = x \text{ if } x > 0 \text{ else } \alpha(e^x - 1)$

From NumPy to PyTorch

While building from scratch is essential for conceptual mastery, real-world Gravitational Wave data analysis requires the computational efficiency of specialized libraries.

Once you understand the manual implementation, proceed to the Deep Learning Baseline section to learn how to apply these concepts to LIGO/Virgo/Taiji data using torch.nn.