Fetching a specific commit hash from a Git repository is surprisingly non-trivial. Here's why — and the progressive strategy that makes it fast.
You have a Git URL and a commit hash — say deadbeef. You want the code at exactly that revision. Simple, right?
git clone https://github.com/example/repo
git checkout deadbeefThis works, but it's slow. Cloning a large repository can take anywhere from a few seconds to several hours depending on its history. You don't need the whole history — just one commit. Can we do better?
It turns out the answer is yes, but not in the way you might expect. Git's protocol has a subtle constraint that makes fetching arbitrary commits surprisingly tricky.
Advertised Refs: What Git Exposes
When you connect to a Git server, it doesn't expose every commit — it only tells you about its advertised refs: the tips of branches and tags. You can inspect them without cloning at all:
git ls-remote https://github.com/example/repo
# a3f1c9d2... refs/heads/master
# 9c14a8b5... refs/tags/v1.2.0
# ...Branch tips and tags are the only commits the server is guaranteed to tell you about. Everything else — every commit buried in history — is hidden by default.
Commits at branch tips and tags are advertised. All other commits are unadvertised. The difference matters a great deal for fetching.
Why Not Just Fetch the Commit Directly?
You might think: git fetch origin deadbeef. On most hosted services — including GitHub — this fails:
error: Server does not allow request for unadvertised objectGitHub's server does not serve arbitrary commit objects on demand. Only advertised refs can be fetched directly. If your target commit is not a branch tip or tag, you cannot fetch it this way.
Neither Branches Nor Tags Are Immutable
There's another complication: even the advertised refs are mutable.
- Branch tips move. A commit that was once the tip of
mastermay now be three hundred commits behind. Or it may have been merged and the branch deleted entirely. - Tags are supposed to be immutable, but
git push --forcecan move them. And even if the tag stays put, new work on that branch may carry the commit you care about somewhere else in history.
This means you can't rely on an advertised ref to tell you where in the history your target commit lives — you have to go looking.
A Smarter Strategy: Progressive Deepening
The key insight is that we can use shallow clones to look at history incrementally, without ever fetching more than we need.
Step through the visualisation below to see how this works in practice:
1. The target commit
1 / 6We want to fetch commit `deadbeef`. It lives somewhere in the history of this repository — but we don't know exactly where yet.
Step by Step
1. Check ls-remote first.
Before touching any objects, query the advertised refs. If deadbeef happens to be a branch tip or tag right now, you can do a shallow clone and check it out immediately — no deepening required.
2. Shallow clone at depth 1.
If it's not advertised, do git clone --depth=1. This fetches only the very tip of the default branch. The rest of history is behind a shallow boundary — the server won't send those objects until you ask.
3. Check if the commit is present.
git cat-file -e deadbeef 2>/dev/null && echo found — a purely local check, no network needed.
4. Deepen progressively.
If not found, run git fetch --deepen 1 (or --deepen N for larger jumps). This extends the shallow clone one step further. Repeat until the commit is found or history is exhausted.
5. Full clone as last resort.
If the commit genuinely cannot be found by deepening — perhaps the branch hint is wrong, or the commit was force-pushed away — fall back to a full git clone.
The key property of this strategy: each check before a network call is local and cheap. You only deepen when the local check confirms the commit isn't there yet. The expensive I/O is deferred as long as possible.
Why We Need a Branch Hint
--deepen walks backwards from a known branch tip. If you don't know which branch contains your commit, you'd have to deepen every branch — which is expensive.
This is why Buckaroo stores a branch hint in its lock-file alongside the commit hash:
[lock."github.com/buckaroo-pm/boost-config"]
versions = [ "branch=master" ]
revision = "deadbeef..."At resolution time, Buckaroo records which branch was live when the lock was generated. At install time, it uses that hint to target the right branch for deepening — avoiding unnecessary work on unrelated branches.
The GitHub / GitLab Shortcut
If the repository is hosted on GitHub, GitLab, or Bitbucket, there's a faster path: their APIs can serve commit metadata and archives directly.
# GitHub: download a tarball of any commit
curl -L https://api.github.com/repos/example/repo/tarball/deadbeef -o src.tar.gzThis bypasses the Git protocol entirely. No clone, no ls-remote, no deepening — just an HTTP request. For a one-off download of a specific commit it is hard to beat.
The trade-off appears on subsequent updates. A tarball is a complete snapshot — when you upgrade a dependency from deadbeef to the next version, you download the full archive again even if only a handful of files changed. Git, by contrast, is a content-addressable store of deltas: git fetch transfers only the objects that differ between what you already have and what you need. For a package you've installed before, the incremental fetch can be a tiny fraction of the full archive size.
Use the platform API tarball for a first install or a one-off fetch — it's the simplest path. Switch back to Git fetch (--deepen or a full clone with a local cache) when you're upgrading a dependency you already have on disk, so only the diff travels over the wire.
Local Cache as a Remote
One last trick worth knowing: Git repositories can use other local Git repositories as remotes. Buckaroo keeps a global on-disk Git cache. When installing a package, it first tries to fetch from the local cache before hitting the network. If the commit is already cached from a previous install, no network call is made at all.
Because each package in the packages folder is itself a Git repository, upgrading is also cheap — only the diff between the current and next version needs to travel over the wire.
Summary
Fetching an arbitrary commit hash from a Git repository is non-trivial because:
- Only branch tips and tags are advertised — arbitrary commits cannot be fetched directly on most hosts.
- Advertised refs are mutable — branches move, tags can be force-pushed, the commit you want may not be at the tip anymore.
- Shallow clones + progressive deepening let you walk backwards through history incrementally, fetching only as much as needed.
- A branch hint (stored in the lock-file) focuses the deepening on the right branch.
- Platform APIs (GitHub, GitLab, Bitbucket) offer a fast shortcut that bypasses the Git protocol entirely.
- A local cache means repeated installs of the same version are free.
| Strategy | When to use | Cost |
|---|---|---|
ls-remote check | Always first | Very cheap — no objects fetched |
| Platform API tarball | GitHub / GitLab / Bitbucket | Fast HTTP download |
| Shallow clone + checkout | Commit is at branch tip | Minimal — only tip fetched |
--deepen loop with hint | Commit buried in known branch | Proportional to depth |
| Full clone | No hint, host doesn't support deepening | Expensive |
The right strategy depends on what you know about the commit. Storing that knowledge — the branch hint — at resolution time is what makes subsequent installs fast.
Related Articles
buckaroo.pm - Find C++ packages
A companion website for the Buckaroo package manager — browse ~400 C++ packages from GitHub, explore the dependency graph, and find what you need instantly with precompiled fuzzy search.
Buckaroo: Conflict-Driven Dependency Resolution for C++
How Buckaroo's dependency resolver borrows ideas from SAT solvers — using conflict-driven clause learning to prune the search space and resolve C++ packages blazingly fast.
Optimizing CI Builds with Docker Layer Caching on TeamCity
How we cut build times from 45 minutes down to 3–10 minutes on an on-premise TeamCity cluster using multistage Docker builds, content-addressed cache keys, and shared filesystem volumes — plus what modern BuildKit unlocks today.