Neural Networks from Scratch
Building a neural network from scratch is a foundational exercise in this bootcamp. Before transitioning to high-level frameworks like PyTorch (as seen in the Baseline Model), it is crucial to understand the underlying mathematics—linear algebra, calculus, and optimization—that enable deep learning.
This section covers the implementation of a Multi-Layer Perceptron (MLP) using only Python and NumPy.
Core Concepts
In a "from scratch" implementation, we decompose the neural network into its primary mathematical operations:
- Linear Transformation: $z = Wx + b$
- Activation Function: $a = \sigma(z)$
- Loss Function: $L(y, \hat{y})$
- Backpropagation: $\frac{\partial L}{\partial W}, \frac{\partial L}{\partial b}$
Mathematical Implementation
1. The Forward Pass
Each layer in a neural network performs a weighted sum of its inputs followed by a non-linear activation. For a single layer, the operation is:
$$ a^{[l]} = \sigma(W^{[l]} a^{[l-1]} + b^{[l]}) $$
Where:
- $W$: Weight matrix
- $b$: Bias vector
- $\sigma$: Activation function (e.g., Sigmoid, ReLU, or ELU as used in the
MyNetbaseline)
2. Loss Functions
To train the network, we need to quantify the error. For binary classification tasks (common in Gravitational Wave glitch detection), we typically use Binary Cross-Entropy (BCE):
$$ J = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})] $$
3. Backpropagation (The Chain Rule)
To update the weights, we calculate the gradient of the loss function with respect to each parameter. This is achieved by applying the chain rule backward from the output layer to the input layer.
$$ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial W} $$
Implementation Example: A Simple Layer
Below is a conceptual implementation of a single dense layer using NumPy. This mirrors the logic handled internally by the nn.Conv2d and nn.Linear layers in the baseline/main.py script.
import numpy as np
class NeuralLayer:
def __init__(self, input_size, output_size):
# He Initialization for weights
self.W = np.random.randn(output_size, input_size) * np.sqrt(2/input_size)
self.b = np.zeros((output_size, 1))
def forward(self, A_prev):
"""
Calculates the linear part of the forward pass.
"""
self.A_prev = A_prev
self.Z = np.dot(self.W, A_prev) + self.b
return self.Z
def backward(self, dZ, learning_rate):
"""
Updates parameters using gradients calculated via the chain rule.
"""
m = self.A_prev.shape[1]
dW = (1/m) * np.dot(dZ, self.A_prev.T)
db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
# Gradient Descent Update
self.W -= learning_rate * dW
self.b -= learning_rate * db
# Gradient to pass to the previous layer
dA_prev = np.dot(self.W.T, dZ)
return dA_prev
Activation Functions
Activation functions introduce non-linearity, allowing the network to learn complex patterns in Gravitational Wave strains.
- Sigmoid: Useful for output layers in binary classification.
- $f(x) = \frac{1}{1 + e^{-x}}$
- ReLU (Rectified Linear Unit): The standard for hidden layers to avoid vanishing gradients.
- $f(x) = \max(0, x)$
- ELU (Exponential Linear Unit): Used in the Bootcamp Baseline Model for its ability to handle negative values more gracefully than ReLU.
- $f(x) = x \text{ if } x > 0 \text{ else } \alpha(e^x - 1)$
From NumPy to PyTorch
While building from scratch is essential for conceptual mastery, real-world Gravitational Wave data analysis requires the computational efficiency of specialized libraries.
| Manual (NumPy) | Bootcamp Baseline (PyTorch) |
| :--- | :--- |
| Manual Gradient Calculation | loss.backward() (Autograd) |
| Manual Weight Updates | optimizer.step() |
| Explicit Matrix Multiplications | nn.Conv2d, nn.Linear |
| CPU-only execution | torch.device('cuda') (GPU acceleration) |
Once you understand the manual implementation, proceed to the Deep Learning Baseline section to learn how to apply these concepts to LIGO/Virgo/Taiji data using torch.nn.