Skip to content
Pipelines and Pizza 🍕
Go back

Managing Large Git Repositories: LFS, Partial Clone, and Sparse Checkout

5 min read

In the previous posts, we covered Git hooks for code quality and submodules for shared code. Now let’s tackle the final challenge: repositories that have grown large enough that git clone becomes a coffee break.

Whether you’re dealing with large binaries, years of history, or a monorepo with dozens of projects, Git has strategies to keep things fast.


The Problem

Large repos slow everything down:

  • Large binaries (ISOs, archives, ML models) bloat the repo permanently
  • Deep history means cloning downloads every commit ever made
  • Monorepos force you to check out code you’ll never touch

Let’s fix each one.


Git LFS for Large Binaries

Git LFS (Large File Storage) replaces large files with small pointer files. The actual content lives on a separate server and downloads on demand.

Set up LFS

git lfs install
git lfs track "*.iso"
git lfs track "*.tar.gz"
git add .gitattributes
git commit -m "chore(lfs): track large files"

The .gitattributes file tells Git which patterns to handle with LFS.

CI needs LFS too

- uses: actions/checkout@v6
  with:
    lfs: true

Or manually:

- run: git lfs install
- run: git lfs pull

When to use LFS

  • Binary files over 1MB that change occasionally
  • Assets that most developers don’t need locally
  • Files that would otherwise bloat clone times

Partial Clone: Skip What You Don’t Need

Partial clone downloads repository metadata but skips blob content until you actually need it.

git clone --filter=blob:none [email protected]:yourorg/giant-repo.git

Git fetches blobs on demand as you check out files. First checkout is slower, but the initial clone is much faster.

Best for

  • CI/CD pipelines that only touch specific paths
  • Developers who don’t need full history locally
  • Repos with large files that aren’t tracked by LFS

Sparse Checkout: Work on a Subset

Sparse checkout limits which directories appear in your working tree. Combined with partial clone, you only download what you need.

git clone --filter=blob:none --sparse [email protected]:yourorg/monorepo.git
cd monorepo
git sparse-checkout init --cone
git sparse-checkout set platform/terraform modules/network

Your working directory now contains only those paths. Everything else exists in Git but isn’t checked out.

Add more paths later

git sparse-checkout add services/api

See what’s included

git sparse-checkout list

Shallow Clone for CI

When CI only needs recent history (not the full repo), use shallow clone:

git clone --depth=20 [email protected]:yourorg/app.git

This downloads only the last 20 commits. Fast for pipelines that just need to build and test.

Limitations

  • git log only shows shallow history
  • Some operations (blame, bisect) may need to fetch more
  • Can’t push from a shallow clone to a branch with deeper history

Hands-On Lab: Configure LFS and CI

Building on the repo from previous articles:

Step 1: Configure LFS

cd git-hooks-lab
git lfs install
git lfs track "*.tar.gz"
git add .gitattributes
git commit -m "chore(lfs): track archives"

Step 2: Create CI workflow

.github/workflows/infra-ci.yaml:

name: infra-ci
on: [pull_request]
jobs:
  preflight:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with:
          submodules: recursive
          lfs: true
      - uses: hashicorp/setup-terraform@v3
      - run: terraform fmt -check -recursive
      - run: terraform validate || true

This workflow:

  • Checks out code with submodules and LFS
  • Validates Terraform formatting
  • Runs validation (allowing it to fail for now)

Choosing the Right Strategy

SituationSolution
Large binaries (ISOs, models, archives)Git LFS
Slow clones due to history sizePartial clone (--filter=blob:none)
Monorepo, only need some directoriesSparse checkout
CI just needs to build, not full historyShallow clone (--depth=N)

You can combine these. For a monorepo with large binaries:

git clone --filter=blob:none --sparse [email protected]:yourorg/monorepo.git
cd monorepo
git lfs install
git sparse-checkout set my-project/

Troubleshooting Guide

ProblemCauseFix
CI fails to fetch LFSMissing LFS setupAdd lfs: true to checkout action
Clone takes foreverLarge files in historyUse partial clone or LFS
”Blob not found” errorsPartial clone needs to fetchRun git fetch --unshallow or access the file
Sparse checkout missing filesPath not in sparse setgit sparse-checkout add <path>

Quick Reference

# LFS
git lfs install                          # Enable LFS for repo
git lfs track "*.iso"                    # Track file type with LFS
git lfs ls-files                         # List LFS-tracked files

# Partial clone
git clone --filter=blob:none <url>       # Clone without blobs

# Sparse checkout
git sparse-checkout init --cone          # Enable sparse checkout
git sparse-checkout set path/to/dir      # Checkout only specific paths
git sparse-checkout list                 # Show current sparse paths

# Shallow clone
git clone --depth=20 <url>               # Clone with limited history

What’s Next

Next week: Ansible Vault - Securing Secrets in Playbooks. We’ll build a workflow for managing credentials without committing them to Git, integrate with CI, and avoid the leaks that make security teams nervous.

Happy automating!