How to Find a #deadbeef: Fetching Arbitrary Git Commits

Fetching a specific commit hash from a Git repository is surprisingly non-trivial. Here's why — and the progressive strategy that makes it fast.

·7 min read

You have a Git URL and a commit hash — say deadbeef. You want the code at exactly that revision. Simple, right?

git clone https://github.com/example/repo
git checkout deadbeef

This works, but it's slow. Cloning a large repository can take anywhere from a few seconds to several hours depending on its history. You don't need the whole history — just one commit. Can we do better?

It turns out the answer is yes, but not in the way you might expect. Git's protocol has a subtle constraint that makes fetching arbitrary commits surprisingly tricky.

Advertised Refs: What Git Exposes

When you connect to a Git server, it doesn't expose every commit — it only tells you about its advertised refs: the tips of branches and tags. You can inspect them without cloning at all:

git ls-remote https://github.com/example/repo
# a3f1c9d2...  refs/heads/master
# 9c14a8b5...  refs/tags/v1.2.0
# ...

Branch tips and tags are the only commits the server is guaranteed to tell you about. Everything else — every commit buried in history — is hidden by default.

Commits at branch tips and tags are advertised. All other commits are unadvertised. The difference matters a great deal for fetching.

Why Not Just Fetch the Commit Directly?

You might think: git fetch origin deadbeef. On most hosted services — including GitHub — this fails:

error: Server does not allow request for unadvertised object

GitHub's server does not serve arbitrary commit objects on demand. Only advertised refs can be fetched directly. If your target commit is not a branch tip or tag, you cannot fetch it this way.

Neither Branches Nor Tags Are Immutable

There's another complication: even the advertised refs are mutable.

  • Branch tips move. A commit that was once the tip of master may now be three hundred commits behind. Or it may have been merged and the branch deleted entirely.
  • Tags are supposed to be immutable, but git push --force can move them. And even if the tag stays put, new work on that branch may carry the commit you care about somewhere else in history.

This means you can't rely on an advertised ref to tell you where in the history your target commit lives — you have to go looking.

A Smarter Strategy: Progressive Deepening

The key insight is that we can use shallow clones to look at history incrementally, without ever fetching more than we need.

Step through the visualisation below to see how this works in practice:

1. The target commit

1 / 6
f501bc3master9c14a8b🏷 v1.2.0b82e4a1a3f1c9ddeadbeee9d7f23

We want to fetch commit `deadbeef`. It lives somewhere in the history of this repository — but we don't know exactly where yet.

Step by Step

1. Check ls-remote first. Before touching any objects, query the advertised refs. If deadbeef happens to be a branch tip or tag right now, you can do a shallow clone and check it out immediately — no deepening required.

2. Shallow clone at depth 1. If it's not advertised, do git clone --depth=1. This fetches only the very tip of the default branch. The rest of history is behind a shallow boundary — the server won't send those objects until you ask.

3. Check if the commit is present. git cat-file -e deadbeef 2>/dev/null && echo found — a purely local check, no network needed.

4. Deepen progressively. If not found, run git fetch --deepen 1 (or --deepen N for larger jumps). This extends the shallow clone one step further. Repeat until the commit is found or history is exhausted.

5. Full clone as last resort. If the commit genuinely cannot be found by deepening — perhaps the branch hint is wrong, or the commit was force-pushed away — fall back to a full git clone.

The key property of this strategy: each check before a network call is local and cheap. You only deepen when the local check confirms the commit isn't there yet. The expensive I/O is deferred as long as possible.

Why We Need a Branch Hint

--deepen walks backwards from a known branch tip. If you don't know which branch contains your commit, you'd have to deepen every branch — which is expensive.

This is why Buckaroo stores a branch hint in its lock-file alongside the commit hash:

[lock."github.com/buckaroo-pm/boost-config"]
versions  = [ "branch=master" ]
revision  = "deadbeef..."

At resolution time, Buckaroo records which branch was live when the lock was generated. At install time, it uses that hint to target the right branch for deepening — avoiding unnecessary work on unrelated branches.

The GitHub / GitLab Shortcut

If the repository is hosted on GitHub, GitLab, or Bitbucket, there's a faster path: their APIs can serve commit metadata and archives directly.

# GitHub: download a tarball of any commit
curl -L https://api.github.com/repos/example/repo/tarball/deadbeef -o src.tar.gz

This bypasses the Git protocol entirely. No clone, no ls-remote, no deepening — just an HTTP request. For a one-off download of a specific commit it is hard to beat.

The trade-off appears on subsequent updates. A tarball is a complete snapshot — when you upgrade a dependency from deadbeef to the next version, you download the full archive again even if only a handful of files changed. Git, by contrast, is a content-addressable store of deltas: git fetch transfers only the objects that differ between what you already have and what you need. For a package you've installed before, the incremental fetch can be a tiny fraction of the full archive size.

💡

Use the platform API tarball for a first install or a one-off fetch — it's the simplest path. Switch back to Git fetch (--deepen or a full clone with a local cache) when you're upgrading a dependency you already have on disk, so only the diff travels over the wire.

Local Cache as a Remote

One last trick worth knowing: Git repositories can use other local Git repositories as remotes. Buckaroo keeps a global on-disk Git cache. When installing a package, it first tries to fetch from the local cache before hitting the network. If the commit is already cached from a previous install, no network call is made at all.

Because each package in the packages folder is itself a Git repository, upgrading is also cheap — only the diff between the current and next version needs to travel over the wire.

Summary

Fetching an arbitrary commit hash from a Git repository is non-trivial because:

  1. Only branch tips and tags are advertised — arbitrary commits cannot be fetched directly on most hosts.
  2. Advertised refs are mutable — branches move, tags can be force-pushed, the commit you want may not be at the tip anymore.
  3. Shallow clones + progressive deepening let you walk backwards through history incrementally, fetching only as much as needed.
  4. A branch hint (stored in the lock-file) focuses the deepening on the right branch.
  5. Platform APIs (GitHub, GitLab, Bitbucket) offer a fast shortcut that bypasses the Git protocol entirely.
  6. A local cache means repeated installs of the same version are free.
StrategyWhen to useCost
ls-remote checkAlways firstVery cheap — no objects fetched
Platform API tarballGitHub / GitLab / BitbucketFast HTTP download
Shallow clone + checkoutCommit is at branch tipMinimal — only tip fetched
--deepen loop with hintCommit buried in known branchProportional to depth
Full cloneNo hint, host doesn't support deepeningExpensive

The right strategy depends on what you know about the commit. Storing that knowledge — the branch hint — at resolution time is what makes subsequent installs fast.

Related Articles