Git: a practical workflow
Git is the backbone of collaborative development. It also protects reproducibility: you can point to an exact state of code, configs, and decisions that produced a result.
On this page
- The branch model
- Setup
- Daily commands
- Commit best practices
- Workflow
- State migrations and infrastructure changes
- Hotfix
- Partial merge
- Cleaning up history
- Stash and undo
- Reproducibility
The branch model
Use three long-lived branches:
main(protected): what is released. It should not break.staging(protected): what is about to be released (release candidates). It should not break.dev(not protected): where day-to-day work integrates. It can break.
Supporting branches:
feat/<topic>: feature / experiment branches created fromdev.hotfix/<topic>: urgent fixes created frommain.
This gives you speed on dev and control on staging + main.
Setup
- Clone an existing repo:
git clone <repository-url> - Or init a new repo:
git init, thengit remote add origin <your-repo-url> - Identity (so commits are attributable):
git config --global user.name "Your Name",git config --global user.email "your.email@example.com" - Ignore local state early via
.gitignore(see.gitignore.example)
Sometimes you will also use .gitattributes to standardize things like line endings, file treatment, and merge behavior (see .gitattributes.example).
Resources (copy/paste templates)
Use these when you want a strong baseline for repo hygiene without reinventing the “standard ignores” wheel.
.gitignore.example
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
# Pipfile.lock
# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# uv.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
# poetry.lock
# poetry.toml
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
# pdm.lock
# pdm.toml
.pdm-python
.pdm-build/
# pixi
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
# pixi.lock
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
# in the .venv directory. It is recommended not to include this directory in version control.
.pixi
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# Redis
*.rdb
*.aof
*.pid
# RabbitMQ
mnesia/
rabbitmq/
rabbitmq-data/
# ActiveMQ
activemq-data/
# SageMath parsed files
*.sage.py
# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
# .idea/
# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/
# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/
# Ruff stuff:
.ruff_cache/
# PyPI configuration file
.pypirc
# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/
# Streamlit
.streamlit/secrets.toml
# Local .terraform directories
.terraform/
# .tfstate files
*.tfstate
*.tfstate.*
# Crash log files
crash.log
crash.*.log
# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json
# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json
# Ignore transient lock info files created by terraform apply
.terraform.tfstate.lock.info
# Include override files you do wish to add to version control using negated pattern
# !example_override.tf
# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*
# Ignore CLI configuration files
.terraformrc
terraform.rc
# Optional: ignore graph output files generated by `terraform graph`
# *.dot
# Optional: ignore plan files saved before destroying Terraform configuration
# Uncomment the line below if you want to ignore planout files.
# planout
# MLflow
mlruns/
mlartifacts/
outputs/
mlruns*.db
mlflow.db
# SQLite WAL mode temporary files
*-shm
*-wal.gitattributes.example
# Auto detect text files and normalize to LF
* text=auto eol=lf
# Force LF for shell scripts (critical for Linux/Mac)
*.sh text eol=lf
# Force LF for common text files
*.ts text eol=lf
*.tsx text eol=lf
*.js text eol=lf
*.jsx text eol=lf
*.json text eol=lf
*.md text eol=lf
*.yml text eol=lf
*.yaml text eol=lf
*.css text eol=lf
*.html text eol=lf
*.sql text eol=lf
*.patch text eol=lf
*.py text eol=lf
*.pyi text eol=lf
*.toml text eol=lf
*.ini text eol=lf
*.cfg text eol=lf
*.env text eol=lf
*.txt text eol=lf
*.rst text eol=lf
# Common repo files without extensions
Dockerfile text eol=lf
Makefile text eol=lf
poetry.lock text eol=lf
uv.lock text eol=lf
# Notebooks
# If you use nbdime, you can enable better diffs via `nbdime config-git --enable`.
*.ipynb text eol=lf
# Binary files
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.webp binary
*.ico binary
*.woff binary
*.woff2 binary
*.ttf binary
*.eot binary
*.pdf binary
*.zip binary
*.tar binary
*.gz binary
*.7z binary
# Common data / ML artifacts (treat as binary to avoid noisy diffs)
*.parquet binary
*.feather binary
*.arrow binary
*.h5 binary
*.hdf5 binary
*.npz binary
*.npy binary
*.pkl binary
*.joblib binary
*.onnx binary
*.pt binary
*.pth binary
# Optional Git LFS examples (uncomment if you use LFS for large artifacts)
# *.parquet filter=lfs diff=lfs merge=lfs -text
# *.onnx filter=lfs diff=lfs merge=lfs -textDaily commands
- What’s going on:
git status - What changed:
git diff,git diff --staged - History:
git log --oneline --graph --all - Stage + commit:
git add <file>,git add .,git commit -m "feat: short summary" - Sync:
git pull,git push
If you keep typing the same long commands, add an alias once and use it forever. Create one with git config --global alias.<name> "<command>", for example: git config --global alias.st status, git config --global alias.lg "log --oneline --graph --all".
Commit best practices
Treat commits as a project log. A good commit message helps reviewers understand intent and helps you debug and audit later.
Commit types: feat, ref, fix, test, docs, chore, style, data, exp.
The message format
- Use the pattern
type: short summary. - Write the summary in the imperative mood (“add”, “fix”, “remove”, “refactor”).
- Keep it specific to the outcome, not the file names.
What each type means
feat: user-facing capability or pipeline capability added (for example “feat: add batch inference job”).ref: refactor that preserves behavior but improves structure/clarity (for example “ref: split feature engineering into module”).fix: defect correction without changing intended scope (for example “fix: handle empty dataset gracefully”).test: test-only change (for example “test: add regression for leakage bug”).docs: documentation-only change (for example “docs: clarify release process”).chore: maintenance work that is not a feature or fix (for example “chore: bump pre-commit hooks”).style: formatting, whitespace, linting, or cosmetic changes with no behavior change (for example “style: apply black formatting to utils module”).data: dataset tracking or transformations that change the data contract (for example “data: update schema for customer events v2”).exp: AI/ML experiments (for example “exp: compare xgboost vs logistic regression”).
Practical rules
- Keep commits small and coherent (one intent per commit).
- Prefer committing work that can be reviewed (avoid mixing formatting, refactors, and behavior changes in one commit).
- If something is temporary or broken, keep it on your branch; do not merge it into
dev. - Keep branches short-lived and delete them after merge.
- Never commit secrets or large artifacts (use
.gitignore, Git LFS, or DVC depending on the artifact).
Workflow
Feature development
Start from dev and branch:
git checkout devgit pullgit checkout -b feat/<topic>
Work in small commits:
git add .git commit -m "feat: <meaningful summary>"git push
If this is the first time you push the branch, Git may ask you to set an upstream. Copy/paste the command it suggests, then continue using git push.
When ready:
- Sync with
devbefore the PR:git pull origin dev - If you hit conflicts, resolve them, then
git add .,git commit -m "fix: resolve merge conflict", andgit push - Open a PR:
feat/<topic>→dev - If
devmoves during review, re-sync:git pull origin dev, resolve conflicts, thengit push
Promote to release candidate
When dev is in a releasable state:
- Open a PR:
dev→staging - Run the checks that matter for production-like validation on
staging
staging is protected, so changes land via PR merge (no direct pushes).
Release
When staging is validated:
- Open a PR:
staging→main - Create the release in GitHub (tag like
v1.2.0, release notes from PRs)
main is protected, so releases land via PR merge (no direct pushes).
State migrations and infrastructure changes
State changes are different from code changes: they modify shared resources. Typical examples are database migrations (for example Alembic) and infrastructure changes (for example Terraform). The goal is to keep feature work moving while applying state changes in a controlled, repeatable way.
Realistic flow
- Do the work on a feature branch like any other change and open a PR to
dev. - Apply state changes only after the change lands in
dev, so a single person or small group can run them consistently without every feature branch touching shared state. - Promote
dev→stagingvia PR, then apply the same state changes in the staging environment. - Promote
staging→mainvia PR, then apply the same state changes in production.
This keeps state aligned with the code that expects it, without blocking unrelated features on dev. It is acceptable for dev to break; staging and main should not.
Testing from a feature branch
If you need to validate a migration before it merges:
- Use an isolated environment that cannot impact other developers, such as a local database, a dedicated sandbox, or an ephemeral preview environment.
Hotfix
When production is broken, fix from main:
git checkout maingit pullgit checkout -b hotfix/<topic>git add .git commit -m "fix: <summary>"git push
Then:
- Open a PR:
hotfix/<topic>→main - After merge, sync back so you do not reintroduce the bug:
main→staging(protected): open a PRmain→stagingand merge itmain→dev(not protected):git checkout dev,git pull,git merge main,git push
Direct merge + push is acceptable on dev because it is not protected.
Partial merge
When you need specific commits or files on staging without promoting everything from dev.
Cherry-pick a commit
git checkout staginggit pullgit checkout -b <type>/<topic>git cherry-pick -x <commit-hash>git push- Open a PR:
<type>/<topic>→staging
-x appends the original hash to the message so you can trace it back.
Pull a file from another branch
git checkout staginggit pullgit checkout -b <type>/<topic>git checkout dev -- path/to/filegit commit -m "<type>: <summary>"git push- Open a PR:
<type>/<topic>→staging
Then promote staging → main via the normal release PR.
Cleaning up history
Use squash-merge when merging PRs to keep dev, staging, and main clean with one commit per feature.
When merging a PR, select “Squash and merge”. This collapses all commits from your feature branch into one commit with a clean message.
Commit freely during development, then squash everything when merging.
Stash and undo
- Temporarily park work:
git stash, then latergit stash pop - Discard local edits to a file:
git restore <file-path> - Undo the last commit (keep changes):
git reset --soft HEAD~1 - Undo a pushed commit safely:
git revert <commit-hash>, thengit push
Reproducibility
What to commit
Commit what you can review and what defines reproducibility:
- Code, configs, pipeline definitions, documentation
- Dependency constraints / locks
- Notebooks (ideally with clean diffs)
What not to commit
Avoid committing:
- Raw/large datasets, model binaries, checkpoints
- Secrets and tokens
- Generated outputs that are easy to reproduce
Large files
If you need to version large artifacts:
- Git LFS (large binaries in Git via pointers):
git lfs install,git lfs track "<pattern>" - DVC (data/model artifacts + remote storage):
dvc init,dvc add <data-path>, thendvc push
Notebooks
If notebook diffs are too noisy:
- Clean outputs: install
nbstripout, thennbstripout --install - Better diffs: install
nbdime, thennbdime config-git --enable