Glitch Analysis & GravitySpy

Glitch Analysis and the GravitySpy Framework

In gravitational wave (GW) astronomy, a glitch is a transient noise artifact in the detector data that can mimic or obscure actual astrophysical signals. Identifying and classifying these glitches is a critical step in data cleaning and detector characterization.

This section explores how the bootcamp utilizes machine learning—specifically supervised learning and clustering—to handle instrumental noise using the GravitySpy dataset paradigm.

Overview of Glitches

Glitches are caused by a variety of environmental and instrumental factors (e.g., laser fluctuations, power line noise, or even seismic activity). Because they are non-Gaussian and non-stationary, they cannot be removed using simple linear filters. The bootcamp focuses on:

Time-Frequency Analysis: Visualizing glitches as spectrograms (Q-transforms).
Morphological Classification: Grouping glitches based on their visual "shape" in the frequency domain.

The GravitySpy Dataset

GravitySpy is a citizen-science and machine learning project that provides labeled examples of various glitch classes (e.g., "Blip," "Whistle," "Koi Fish"). In this training, we use these datasets to train deep learning models.

Loading and Preprocessing

The bootcamp provides utilities to generate or load time-series data and convert it into a format suitable for neural networks. As seen in data_prep_bbh.py, the preprocessing pipeline typically involves:

Whitenening: Normalizing the data against the Power Spectral Density (PSD) of the detector noise.
Windowing: Applying a Tukey window to handle edge effects.
Resampling: Ensuring the sampling frequency ($f_s$) matches the model requirements (e.g., 8192 Hz).

Supervised Learning for Glitch Classification

The primary approach for glitch classification in the bootcamp is using a Convolutional Neural Network (CNN). The architecture defined in main.py (the MyNet class) is optimized for processing multi-detector input.

Usage Example: Initializing the Classifier

You can define a model tailored for glitch detection by stacking convolutional layers that capture spatial features in time-frequency maps.

import torch
from baseline.main import MyNet

# Initialize the network
model = MyNet()

# The model expects input in the shape (batch_size, channels, height, width)
# For GW data, channels usually represent different detectors (H1, L1)
example_input = torch.randn(1, 1, 2, 8192) 
output = model(example_input)

Training with `DatasetGenerator`

To train a model on glitches, the DatasetGenerator can be configured to produce noise realizations.

from baseline.main import DatasetGenerator

# Generate training data with specific SNR and detector configurations
train_data = DatasetGenerator(
    fs=8192, 
    T=1, 
    snr=20, 
    detectors=['H1', 'L1'],
    nsample_perepoch=1000
)

# Use DataLoader for batch processing
from torch.utils.data import DataLoader
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)

Unsupervised Learning: Clustering Glitches

When encountering unknown noise sources, supervised learning is insufficient. The bootcamp introduces unsupervised techniques to discover new glitch classes:

Feature Extraction: Using the layers of a pre-trained CNN (like MyNet) to extract high-level features from glitch spectrograms.
Dimensionality Reduction: Applying t-SNE or PCA to project features into a 2D or 3D space.
Clustering: Using K-Means or DBSCAN to identify clusters of similar noise events.

Practical Workflow

To perform a complete glitch analysis during the bootcamp, follow these steps:

Data Acquisition: Use sim_data to create synthetic glitches or load GravitySpy-labeled samples.
Spectrogram Generation: Convert time-series data into time-frequency representations.
Model Training: Run the main.py script to train the MyNet architecture on known glitch classes.
Validation: Use the utils.Animator class to monitor training loss and accuracy in real-time.
Inference: Deploy the trained model to flag glitches in raw strain data before performing gravitational wave searches.

Note: For deep learning tasks, ensure torch.cuda.is_available() returns True to leverage GPU acceleration, which is significant for the 8-layer convolutional structure used in this curriculum.