Assignments & Peer Review

Assessment Overview

The GWData-Bootcamp follows a progressive assessment structure designed to transition students from foundational programming to advanced deep learning applications in gravitational wave (GW) physics. Evaluation is based on a series of module-specific homework assignments and a final research-oriented project.

Homework Modules

Homework assignments are provided as interactive Jupyter Notebooks (exported as HTML for reference). These modules focus on data preprocessing, statistical analysis, and classical machine learning.

Machine Learning: Credit Scoring & Ensemble Methods

These assignments serve as a bridge to understanding classification problems before tackling complex GW signals.

Module 1: Credit Scoring Baseline: Implementing logistic regression and decision trees using scikit-learn.
Module 2: Fine-Tuning & Ensembling: Optimizing model hyperparameters and implementing ensemble techniques (Random Forests, Gradient Boosting) to improve F1-scores and AUC.

Submission Requirements:

Complete the missing code blocks in the provided .ipynb files.
Include a brief summary of the model performance (confusion matrix, ROC curves).
Ensure all plots generated by matplotlib are clearly labeled and interpreted.

Final Project: Gravitational Wave Signal Detection

The capstone of this bootcamp is the development of a Deep Learning pipeline to detect Binary Black Hole (BBH) signals in noisy strain data. You are provided with a baseline PyTorch implementation to serve as a starting point.

Project Components

Your submission must utilize the following framework:

Data Generation (data_prep_bbh.py): Use the sim_data function to generate synthetic GW signals. You must experiment with different Signal-to-Noise Ratios (SNR) to test model robustness.
Neural Network Architecture (main.py): The baseline uses MyNet, a Convolutional Neural Network (CNN). You are encouraged to modify the layers, such as nn.Conv2d and nn.BatchNorm2d, or implement novel architectures (e.g., ResNet, Transformers).
Training Loop: Implement training and validation logic using the DatasetGenerator and DataLoader classes.

Code Example: Extending the Baseline

To customize your model, you might modify the MyNet structure:

import torch.nn as nn

class CustomGWNet(nn.Module):
    def __init__(self):
        super(CustomGWNet, self).__init__()
        # Example: Adding a deeper layer or changing kernel sizes
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=(1, 32), stride=1),
            nn.ELU(alpha=0.01),
            nn.BatchNorm2d(16),
            nn.MaxPool2d(kernel_size=(1, 8))
        )
        # Add your own flatten and FC layers
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(16 * output_dim, 2) # Binary classification: Noise vs. Signal
        )

    def forward(self, x):
        x = self.feature_extractor(x)
        return self.classifier(x)

Submission Process

We utilize a GitHub-centric workflow to simulate real-world research collaboration.

Fork and Clone: Fork the repository and create a feature branch for your cohort (e.g., 2023-assignments-yourname).
Implementation: Save your completed notebooks and scripts in a dedicated directory under your name.
Pull Request (PR): Submit a PR to the main repository. Ensure your PR description summarizes your findings and any optimizations made to the baseline model.

Peer Review & Evaluation

The "Peer Review" phase is critical for mastering data-driven modeling. After submitting your PR:

Review Requirement: Each participant must review at least two other submissions.
Feedback Criteria:
- Code Clarity: Is the code readable and well-commented?
- Methodology: Are the data augmentation or SNR scaling choices scientifically sound?
- Results: Did the participant provide an "Animator" plot (from utils.py) showing the loss/accuracy convergence?
Discussion: Use the GitHub "Conversations" tab on PRs to ask questions about model architecture or hyperparameter choices.

Evaluation Rubric