Git: a practical workflow

Git is the backbone of collaborative development. It also protects reproducibility: you can point to an exact state of code, configs, and decisions that produced a result.


On this page


The branch model

Use three long-lived branches:

  • main (protected): what is released. It should not break.
  • staging (protected): what is about to be released (release candidates). It should not break.
  • dev (not protected): where day-to-day work integrates. It can break.

Supporting branches:

  • feat/<topic>: feature / experiment branches created from dev.
  • hotfix/<topic>: urgent fixes created from main.

This gives you speed on dev and control on staging + main.


Setup

  • Clone an existing repo: git clone <repository-url>
  • Or init a new repo: git init, then git remote add origin <your-repo-url>
  • Identity (so commits are attributable): git config --global user.name "Your Name", git config --global user.email "your.email@example.com"
  • Ignore local state early via .gitignore (see .gitignore.example)

Sometimes you will also use .gitattributes to standardize things like line endings, file treatment, and merge behavior (see .gitattributes.example).

Resources (copy/paste templates)

Use these when you want a strong baseline for repo hygiene without reinventing the “standard ignores” wheel.

.gitignore.example
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#   Usually these files are written by a python script from a template
#   before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
# Pipfile.lock

# UV
#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
# uv.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
# poetry.lock
# poetry.toml

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
# pdm.lock
# pdm.toml
.pdm-python
.pdm-build/

# pixi
#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
# pixi.lock
#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
#   in the .venv directory. It is recommended not to include this directory in version control.
.pixi

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# Redis
*.rdb
*.aof
*.pid

# RabbitMQ
mnesia/
rabbitmq/
rabbitmq-data/

# ActiveMQ
activemq-data/

# SageMath parsed files
*.sage.py

# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#   JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#   be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#   and can be added to the global gitignore or merged into this file.  For a more nuclear
#   option (not recommended) you can uncomment the following to ignore the entire idea folder.
# .idea/

# Abstra
#   Abstra is an AI-powered process automation framework.
#   Ignore directories containing user credentials, local state, and settings.
#   Learn more at https://abstra.io/docs
.abstra/

# Visual Studio Code
#   Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 
#   that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
#   and can be added to the global gitignore or merged into this file. However, if you prefer, 
#   you could uncomment the following to ignore the entire vscode folder
# .vscode/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
.pypirc

# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/

# Streamlit
.streamlit/secrets.toml

# Local .terraform directories
.terraform/

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

# Exclude all .tfvars files, which are likely to contain sensitive data, such as
# password, private keys, and other secrets. These should not be part of version
# control as they are data points which are potentially sensitive and subject
# to change depending on the environment.
*.tfvars
*.tfvars.json

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Ignore transient lock info files created by terraform apply
.terraform.tfstate.lock.info

# Include override files you do wish to add to version control using negated pattern
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*

# Ignore CLI configuration files
.terraformrc
terraform.rc

# Optional: ignore graph output files generated by `terraform graph`
# *.dot

# Optional: ignore plan files saved before destroying Terraform configuration
# Uncomment the line below if you want to ignore planout files.
# planout

# MLflow
mlruns/
mlartifacts/
outputs/
mlruns*.db
mlflow.db
# SQLite WAL mode temporary files
*-shm
*-wal
.gitattributes.example
# Auto detect text files and normalize to LF
* text=auto eol=lf

# Force LF for shell scripts (critical for Linux/Mac)
*.sh text eol=lf

# Force LF for common text files
*.ts text eol=lf
*.tsx text eol=lf
*.js text eol=lf
*.jsx text eol=lf
*.json text eol=lf
*.md text eol=lf
*.yml text eol=lf
*.yaml text eol=lf
*.css text eol=lf
*.html text eol=lf
*.sql text eol=lf
*.patch text eol=lf
*.py text eol=lf
*.pyi text eol=lf
*.toml text eol=lf
*.ini text eol=lf
*.cfg text eol=lf
*.env text eol=lf
*.txt text eol=lf
*.rst text eol=lf

# Common repo files without extensions
Dockerfile text eol=lf
Makefile text eol=lf
poetry.lock text eol=lf
uv.lock text eol=lf

# Notebooks
# If you use nbdime, you can enable better diffs via `nbdime config-git --enable`.
*.ipynb text eol=lf

# Binary files
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.webp binary
*.ico binary
*.woff binary
*.woff2 binary
*.ttf binary
*.eot binary
*.pdf binary
*.zip binary
*.tar binary
*.gz binary
*.7z binary

# Common data / ML artifacts (treat as binary to avoid noisy diffs)
*.parquet binary
*.feather binary
*.arrow binary
*.h5 binary
*.hdf5 binary
*.npz binary
*.npy binary
*.pkl binary
*.joblib binary
*.onnx binary
*.pt binary
*.pth binary

# Optional Git LFS examples (uncomment if you use LFS for large artifacts)
# *.parquet filter=lfs diff=lfs merge=lfs -text
# *.onnx filter=lfs diff=lfs merge=lfs -text

Daily commands

  • What’s going on: git status
  • What changed: git diff, git diff --staged
  • History: git log --oneline --graph --all
  • Stage + commit: git add <file>, git add ., git commit -m "feat: short summary"
  • Sync: git pull, git push

If you keep typing the same long commands, add an alias once and use it forever. Create one with git config --global alias.<name> "<command>", for example: git config --global alias.st status, git config --global alias.lg "log --oneline --graph --all".


Commit best practices

Treat commits as a project log. A good commit message helps reviewers understand intent and helps you debug and audit later.

Commit types: feat, ref, fix, test, docs, chore, style, data, exp.

The message format

  • Use the pattern type: short summary.
  • Write the summary in the imperative mood (“add”, “fix”, “remove”, “refactor”).
  • Keep it specific to the outcome, not the file names.

What each type means

  • feat: user-facing capability or pipeline capability added (for example “feat: add batch inference job”).
  • ref: refactor that preserves behavior but improves structure/clarity (for example “ref: split feature engineering into module”).
  • fix: defect correction without changing intended scope (for example “fix: handle empty dataset gracefully”).
  • test: test-only change (for example “test: add regression for leakage bug”).
  • docs: documentation-only change (for example “docs: clarify release process”).
  • chore: maintenance work that is not a feature or fix (for example “chore: bump pre-commit hooks”).
  • style: formatting, whitespace, linting, or cosmetic changes with no behavior change (for example “style: apply black formatting to utils module”).
  • data: dataset tracking or transformations that change the data contract (for example “data: update schema for customer events v2”).
  • exp: AI/ML experiments (for example “exp: compare xgboost vs logistic regression”).

Practical rules

  • Keep commits small and coherent (one intent per commit).
  • Prefer committing work that can be reviewed (avoid mixing formatting, refactors, and behavior changes in one commit).
  • If something is temporary or broken, keep it on your branch; do not merge it into dev.
  • Keep branches short-lived and delete them after merge.
  • Never commit secrets or large artifacts (use .gitignore, Git LFS, or DVC depending on the artifact).

Workflow

Feature development

Start from dev and branch:

  • git checkout dev
  • git pull
  • git checkout -b feat/<topic>

Work in small commits:

  • git add .
  • git commit -m "feat: <meaningful summary>"
  • git push

If this is the first time you push the branch, Git may ask you to set an upstream. Copy/paste the command it suggests, then continue using git push.

When ready:

  • Sync with dev before the PR: git pull origin dev
  • If you hit conflicts, resolve them, then git add ., git commit -m "fix: resolve merge conflict", and git push
  • Open a PR: feat/<topic>dev
  • If dev moves during review, re-sync: git pull origin dev, resolve conflicts, then git push

Promote to release candidate

When dev is in a releasable state:

  • Open a PR: devstaging
  • Run the checks that matter for production-like validation on staging

staging is protected, so changes land via PR merge (no direct pushes).

Release

When staging is validated:

  • Open a PR: stagingmain
  • Create the release in GitHub (tag like v1.2.0, release notes from PRs)

main is protected, so releases land via PR merge (no direct pushes).


State migrations and infrastructure changes

State changes are different from code changes: they modify shared resources. Typical examples are database migrations (for example Alembic) and infrastructure changes (for example Terraform). The goal is to keep feature work moving while applying state changes in a controlled, repeatable way.

Realistic flow

  • Do the work on a feature branch like any other change and open a PR to dev.
  • Apply state changes only after the change lands in dev, so a single person or small group can run them consistently without every feature branch touching shared state.
  • Promote devstaging via PR, then apply the same state changes in the staging environment.
  • Promote stagingmain via PR, then apply the same state changes in production.

This keeps state aligned with the code that expects it, without blocking unrelated features on dev. It is acceptable for dev to break; staging and main should not.

Testing from a feature branch

If you need to validate a migration before it merges:

  • Use an isolated environment that cannot impact other developers, such as a local database, a dedicated sandbox, or an ephemeral preview environment.

Hotfix

When production is broken, fix from main:

  • git checkout main
  • git pull
  • git checkout -b hotfix/<topic>
  • git add .
  • git commit -m "fix: <summary>"
  • git push

Then:

  • Open a PR: hotfix/<topic>main
  • After merge, sync back so you do not reintroduce the bug:
    • mainstaging (protected): open a PR mainstaging and merge it
    • maindev (not protected): git checkout dev, git pull, git merge main, git push

Direct merge + push is acceptable on dev because it is not protected.


Partial merge

When you need specific commits or files on staging without promoting everything from dev.

Cherry-pick a commit

  • git checkout staging
  • git pull
  • git checkout -b <type>/<topic>
  • git cherry-pick -x <commit-hash>
  • git push
  • Open a PR: <type>/<topic>staging

-x appends the original hash to the message so you can trace it back.

Pull a file from another branch

  • git checkout staging
  • git pull
  • git checkout -b <type>/<topic>
  • git checkout dev -- path/to/file
  • git commit -m "<type>: <summary>"
  • git push
  • Open a PR: <type>/<topic>staging

Then promote stagingmain via the normal release PR.


Cleaning up history

Use squash-merge when merging PRs to keep dev, staging, and main clean with one commit per feature.

When merging a PR, select “Squash and merge”. This collapses all commits from your feature branch into one commit with a clean message.

Commit freely during development, then squash everything when merging.


Stash and undo

  • Temporarily park work: git stash, then later git stash pop
  • Discard local edits to a file: git restore <file-path>
  • Undo the last commit (keep changes): git reset --soft HEAD~1
  • Undo a pushed commit safely: git revert <commit-hash>, then git push

Reproducibility

What to commit

Commit what you can review and what defines reproducibility:

  • Code, configs, pipeline definitions, documentation
  • Dependency constraints / locks
  • Notebooks (ideally with clean diffs)

What not to commit

Avoid committing:

  • Raw/large datasets, model binaries, checkpoints
  • Secrets and tokens
  • Generated outputs that are easy to reproduce

Large files

If you need to version large artifacts:

  • Git LFS (large binaries in Git via pointers): git lfs install, git lfs track "<pattern>"
  • DVC (data/model artifacts + remote storage): dvc init, dvc add <data-path>, then dvc push

Notebooks

If notebook diffs are too noisy:

  • Clean outputs: install nbstripout, then nbstripout --install
  • Better diffs: install nbdime, then nbdime config-git --enable

results matching ""

    No results matching ""