Collaborative Research with Git
Collaborative Research Workflow
In scientific research, version control is not just about saving code—it is about reproducibility, transparency, and collaboration. This bootcamp uses Git to manage code, share datasets via submodules, and track progress on assignments.
Setting Up Your Workspace
To begin your research, you should create a personal fork of the bootcamp repository. This allows you to maintain your own experiments while staying synced with the official curriculum.
- Fork the repository on GitHub to your own account.
- Clone your fork locally:
git clone https://github.com/YOUR_USERNAME/GWData-Bootcamp.git cd GWData-Bootcamp - Configure the upstream remote to receive updates from the bootcamp organizers:
git remote add upstream https://github.com/autorelorg/GWData-Bootcamp.git
Synchronizing with the Bootcamp
As the semester progresses, new lecture materials and baseline models (like the 2023/deep_learning/baseline code) will be added. To sync your local environment:
# Fetch the latest changes from the official repo
git fetch upstream
# Merge updates into your main branch
git checkout main
git merge upstream/main
Branching for Experiments
In gravitational wave research, you will often test different hyperparameters or network architectures. Never work directly on the main branch. Instead, use feature branches for specific experiments.
For example, if you are modifying the MyNet architecture in main.py to test a new filter size:
# Create a new branch for a specific model experiment
git checkout -b experiment/conv-filter-tuning
# After making changes to main.py or data_prep_bbh.py
git add 2023/deep_learning/baseline/main.py
git commit -m "Research: Increase filter size in first layer to (1, 64)"
Best Practice: Use descriptive commit messages that explain the why of the change (e.g., "Adjusted SNR threshold to 15 for sensitivity testing").
Working with Submodules
Scientific projects often depend on external libraries (like LALSuite) or large datasets that are maintained in separate repositories. This bootcamp may utilize Git Submodules to manage these dependencies.
Initializing Submodules
If you see empty directories in the project structure, you likely need to initialize the submodules:
git submodule update --init --recursive
Updating Submodules
To pull the latest versions of external research tools linked to the bootcamp:
git submodule update --remote
Managing Bootcamp Assignments
Assignments often involve Jupyter Notebooks (exported as .html in this repo for viewing). Because Notebooks contain large JSON blobs and binary image data, they can be difficult to track in Git.
Recommendations for Assignments:
- Keep Notebooks Clean: Clear all cell outputs before committing to keep the repository size manageable.
- Modularize Code: Move stable logic (like the
AccumulatororTimerclasses found inutils.py) into Python scripts and import them into your notebooks. - Ignore Data Artifacts: Do not commit large
.hdf5or.npyfiles generated during training. Ensure your.gitignoreincludes:*.log *.pth data/raw/ __pycache__/
Submitting Your Work
When you have completed an assignment (e.g., the CreditScoring machine learning task):
- Push your changes to your fork:
git push origin feature/my-assignment-solution. - Open a Pull Request (PR) against the
autorelorg/GWData-Bootcamprepository if instructed by the teaching team. - Use the PR description to summarize your findings, such as the final accuracy of your
sklearn_model_ensemble.