Assignments & Peer Review
Assessment Overview
The GWData-Bootcamp follows a progressive assessment structure designed to transition students from foundational programming to advanced deep learning applications in gravitational wave (GW) physics. Evaluation is based on a series of module-specific homework assignments and a final research-oriented project.
Homework Modules
Homework assignments are provided as interactive Jupyter Notebooks (exported as HTML for reference). These modules focus on data preprocessing, statistical analysis, and classical machine learning.
Machine Learning: Credit Scoring & Ensemble Methods
These assignments serve as a bridge to understanding classification problems before tackling complex GW signals.
- Module 1: Credit Scoring Baseline: Implementing logistic regression and decision trees using
scikit-learn. - Module 2: Fine-Tuning & Ensembling: Optimizing model hyperparameters and implementing ensemble techniques (Random Forests, Gradient Boosting) to improve F1-scores and AUC.
Submission Requirements:
- Complete the missing code blocks in the provided
.ipynbfiles. - Include a brief summary of the model performance (confusion matrix, ROC curves).
- Ensure all plots generated by
matplotlibare clearly labeled and interpreted.
Final Project: Gravitational Wave Signal Detection
The capstone of this bootcamp is the development of a Deep Learning pipeline to detect Binary Black Hole (BBH) signals in noisy strain data. You are provided with a baseline PyTorch implementation to serve as a starting point.
Project Components
Your submission must utilize the following framework:
- Data Generation (
data_prep_bbh.py): Use thesim_datafunction to generate synthetic GW signals. You must experiment with different Signal-to-Noise Ratios (SNR) to test model robustness. - Neural Network Architecture (
main.py): The baseline usesMyNet, a Convolutional Neural Network (CNN). You are encouraged to modify the layers, such asnn.Conv2dandnn.BatchNorm2d, or implement novel architectures (e.g., ResNet, Transformers). - Training Loop: Implement training and validation logic using the
DatasetGeneratorandDataLoaderclasses.
Code Example: Extending the Baseline
To customize your model, you might modify the MyNet structure:
import torch.nn as nn
class CustomGWNet(nn.Module):
def __init__(self):
super(CustomGWNet, self).__init__()
# Example: Adding a deeper layer or changing kernel sizes
self.feature_extractor = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=(1, 32), stride=1),
nn.ELU(alpha=0.01),
nn.BatchNorm2d(16),
nn.MaxPool2d(kernel_size=(1, 8))
)
# Add your own flatten and FC layers
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(16 * output_dim, 2) # Binary classification: Noise vs. Signal
)
def forward(self, x):
x = self.feature_extractor(x)
return self.classifier(x)
Submission Process
We utilize a GitHub-centric workflow to simulate real-world research collaboration.
- Fork and Clone: Fork the repository and create a feature branch for your cohort (e.g.,
2023-assignments-yourname). - Implementation: Save your completed notebooks and scripts in a dedicated directory under your name.
- Pull Request (PR): Submit a PR to the main repository. Ensure your PR description summarizes your findings and any optimizations made to the baseline model.
Peer Review & Evaluation
The "Peer Review" phase is critical for mastering data-driven modeling. After submitting your PR:
- Review Requirement: Each participant must review at least two other submissions.
- Feedback Criteria:
- Code Clarity: Is the code readable and well-commented?
- Methodology: Are the data augmentation or SNR scaling choices scientifically sound?
- Results: Did the participant provide an "Animator" plot (from
utils.py) showing the loss/accuracy convergence?
- Discussion: Use the GitHub "Conversations" tab on PRs to ask questions about model architecture or hyperparameter choices.
Evaluation Rubric
| Criteria | Weight | Description | | :--- | :--- | :--- | | Correctness | 40% | Model successfully trains and converges on the BBH detection task. | | Innovation | 20% | Improvements over the baseline architecture or training strategy. | | Documentation | 20% | Clear explanation of the methodology and performance analysis. | | Peer Engagement| 20% | Quality of feedback provided to other students during the review phase. |