2026-05-19
My goal for this workshop is to give everyone the tools to:
Ensuring that your workflow is transparent is important for:
Past/Current/Future You
WWALK Lab
Collaborators
Other grad students
Scientific Community
PUBLIC
Good file structure is important because it 1
Best practices include (but are not limited to) 1
There are some tasks that do not need to be “as reproducible” (e.g., fixing typos) - these can be done in OpenRefine.
In general if you are:
Combining data sources
Making decisions about the data itself (e.g., removing or adding data)
Performing calculations
Renaming things
Do this in R (you will be grateful later!)
project
└───raw-data/
└───output/
└───R/
└───graphics/
└───README.md
Let’s set up a new project using RStudio Projects
Add raw-data, output, R, and graphics folders
(Bonus: I recommend you have a folder on your computer dedicated to all R projects)
Piled Higher and Deeper
GitHub is a website-software that documents your progress on a project and allows you to do version control
If you save rough drafts of your writing as you go along - that is version control
Really useful for when you want to go back/change your mind/re-run a test/etc.
Facilitates lower mental load + reproducible science + collaboration/sharing
biost@ts Git Tutorial
biost@ts Git Tutorial

GitHub tracks the changes you make to your repository on your computer
After making changes, you have to select, describe, and commit them
After committing, you push your changes to your remote repository
If you are collaborating on a project, where multiple people are contributing, make sure you pull from the remote repository before starting your work
Same button as push (ctrl + shift + P)
the .gitignore file in your project directory allows you to force git to ignore specific files or folders
there is a syntax for specifying different file types or folders, which can be found in the link above
# Ignore all .txt files
*.txt
# But don't ignore important.txt
!important.txt
# Ignore all files in large directory
large/*
Archiving your project in the lab requires 4 things:
These things can be organized however you’d like, as long as they are easily understood by someone after you are gone.
Does not have a DOI, so does not point to a specific moment in time
Can be changed continuously
Not dedicated to longevity
Can import GitHub repository to a true data archive
Zenodo is a great option for archiving data
Easily links to GitHub repositories
Preserves file structures
Can be updated after reviews/changes with a new DOI
FREE
Other options include Dryad, figshare, and more topic-specific archives (e.g., GenBank)
As always, use what works for you
To connect and archive your code/data with Zenodo from GitHub, there are three main steps
(see an example workthrough here)
NOTE: you do not need to use Git to use Zenodo, you can also upload local files


This workshop - including examples & code can all be found here and formatted slides are here
Software Carpentry: R for Reproducible Scientific Analysis & Version Control with git
Data Carpentry: Data Analysis & Visualization in R for Ecologists & Data Organization in Spreadsheets for Ecologists
biost@ts: Version Control with Git and GitHub
Happy Git: happygitwithr
University of Bergen: Open Access to Research Data
Smart People I Know: Dr. Christie Bahlai’s Reproducible Quantitative Methods Course & Wildlife Ecology & Evolution Lab’s Guide by Alec Robitaille & Val Lucet’s Git Workshop