Git


Now know far more about Git than I ever wanted to.

At its base, a git repository is a collection of directory images. Of course, they are actually stored far more cleverly than this, but from the point of view of a user each one is a snapshot of an entire directory. These are called “commits” in git-speak. Each one has a globally unique 20-byte SHA hash. Each one carries metadata: name/email of the creator, possible cryptographic nonsense, id of the parent commit (and potential secondary parents if this commit is the result of a merge).

On top of this, the repository has a set of references, organised into a directory hierarchy. The content of a reference is simply the 20-byte ID of a commit – they are pointers. A repository should contain all commits named in any ref it may contain. A repository may prune off commits that are not referred to anywhere.

There are three important top-level directories in which refs are put.

refs/tags contains static names for particular commits. ‘Nuff said.

refs/heads contains branches.

The distinctive thing about a branch is that when you do a checkout into a working directory, git tracks which branch is checked out in a file named HEAD. When you do a checkin – creating a new directory image in the underlying repository – the checked out branch is updated to point at the new node. And that’s all there is to branching.

refs/remotes contains refs that are copied form elsewhere.

Finally, on top of the big bucket of commits and your collection of refs sits a working directory. The salient thing about it being which commit it is based on and how it currently differs from that commit. The other thing is which branch you are currently working on if any.

You can work without a current branch (the ‘detached head’ state), make multiple checkins and so on, but the only reference to those checkins is the fact that you are currently working on them. If you switch to a different checkin (by issuing a checkout command), then those checkins will be left dangling in the wind and your repository may delete them. there are several ways to fix this, easiest being to checkin and to create a branch or a tag.

Now, my problem today has been related to mirroring things around the shop, as we are working on a subnet which cannot ssh directly to the subnet where the primary repositories are kept.

The three tasks involved in doing a fetch are:
1 – pull down the commits you want
2 – pull down and manage references
3 – adjust your working directory

Now for step 1, I am just going to pretend that git pulls down all the commits on the other repo. This is not what happens, but notionally it may as well do. Once yo have decided which refs you want, the contents referred to by those refs are a9effectively) pulled down my magic.

It’s the refs that are nasty.

The issue is that “master” may mean different things there vs here. In fact, it will do if someone else has been comitting to the remote repository.

Furthermore: each remote repository has a simple name that you give it, by which it is referred to in your repo. These names may collide. In fact, whenever you clone a repository, the source is always initially named ‘origin’.

What we want to avoid is this:

repo A has repo B listed as a remote named “fred”.
repo B has repo A listed as a remote named “bill”.

Repo A has a branch named “perth”

  • refs
    • heads
      • perth
    • remotes
      • fred

Repo B has a branch named ‘yass’

  • refs
    • heads
      • yass
    • remotes
      • bill

At this point, repo B fetches all the refs from repo A. Now be aware: ‘fred’ is repo A’s private name for repo B. Repo B dies not know or care about this. After the fetch, repo B looks like this:

  • refs
    • heads
      • yass
    • remotes
      • bill
        • heads
          • perth
        • remotes
          • fred

At which point, repo A returns the favour:

  • refs
    • heads
      • perth
    • remotes
      • fred
        • heads
          • yass
        • remotes
          • bill
            • heads
              • perth
            • remotes
              • fred

And so on it goes. Repo A does not know that each time it imports B’s “bill” refs, it is simply importing a copy of itself.

To fix this, there are two steps.

First, each remote is equipped with a default fetch config. A’s ‘fred’ remote config looks like this:

+refs/heads/*:refs/remotes/fred/*

That is: “fetch those branches that are local to fred, and stuff them into remotes/fred”.

So this takes care of the remote loops problem. The other problem is that when I fetch from a master repo, I don’t simply want the images – I want my local branches updated. If someone has made a commit and updated branch “prep-for-release”, then I want my “prep-for-release” branch to point there. Additionally, if I currently have that branch checked out, then I want to merge the changes.

So what you do is, for those remote branches that you want your own branches to track, you merge the changes on those remote branches into your branch. So this just creates another commit, right? Well – yes … but here’s the super magic thing. If the results of performing that merge would result in the exact same result, the exact same directory image, as what is on the remote, then the SHA hash for the result would be the same. So for those kinds of merges, simply updating the local ref to be equal to the remote ref is conceptually the same as going ahead and doing the merge.

And that’s what git does. In git-speak, this is called a fast-forward merge.

Now, to manage this process we have a config file as described here.

So. Getting back to the original problem: need to get pushed changes through a repo on a common subnet.

Our master copy is on j25.
I work on a clone of j25, named p. My changes to p are pushed up to j25, and changes to j25 are pulled back.

The fetch config for j25 on p is:
+refs/heads/*:refs/remotes/p25/*

Which means “copy local refs on p2, track them in a directory named p25, and update my tracked ref even if this does not result in a simple fast-forward”. The only situation where there would not be a fast-forward is if you are making local changes to the tracked branch, which is something you should never do.

A user repository c also wants to work on j25, but can’t access it. So we create an intermediate repo on fz1. How do we set this up?

Well – there are two ways.

1. Clone j25 onto fz1. On c, configure the fetch for fz1 as
+refs/remotes/origin/*:refs/remotes/j25/*
IOW: pull down fz1’s origin and treat it as if it came from j25

2. on fz1, configure the fetch from j25 (origin) as
+refs/*:refs/*
IOW: copy all of j25’s refs onto fz1 as-is.

This is called “mirroring”, and the command ‘git clone –mirror j25 fz1’ will do this automatically.

At this point, we are missing two things.

1) what’s the deal with merging to a branch that you have checked out; and
2) can the mirroring be made automatic?

Answer to 1 – not sure.

Answer to 2 – this involves fooling with the ‘hooks’ section. Not sure how to avoid loops.

Oh: other very important thing – can you commit to the mirror and have changes pushed up? Without this, the whole setup is useless.

Must investigate.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: