Setting up a long term fork with Git

The context

Recently at GiraffeSoft we started a new project, based on another existing project we already had going. We could call this a long term fork. Let me give you a little bit more context on the situation.

  • These projects will both keep being actively developed in the future;
  • They will have some fundamental differences that will not be reconciled;
  • They will however keep many similarities and we expect that they will benefit from the exchange of some specific patches, as development on both moves forward.

In days past, this problem could have been solved reasonably well by cloning the central repository and then exchanging patches and applying them manually.

As you’ve guessed already, we’ve decided to try using Git to help manage this long term relationship.

I must warn you however. It’s the first time we attempt keeping a long term fork like that. We’re not sure whether it’s going to be worth the trouble and whether the diverging of the two projects will eventually prevent us from efficiently benefiting from using Git to manage the exchange of patches. We’re not sure whether this is the best way to accomplish this either.

On one hand, all this trouble may or may not be worth the effort. On the other hand, all of this sounds like ultra elite Git-fu. Hence our decision to explore this approach.

Another less technical reasons was also at play in our decision. We wanted to have one wiki per project on GitHub :-)

What we’re going to do

So at first, we have a remote repository for the initial project. There are of course an arbitrary number of client side clones of the repository (what Subversion calls working directories). None of them will be affected in any way.

The first thing I will do is clone the initial project to a new client side repository. I’ll set this up in such a way that it won’t use the initial project as its default origin. On the other hand I’ll make sure it has one branch that interacts with the initial project for future patch exchanges.

Then from that client-side repository, I’ll initialize a new remote repository, which will serve as the default remote repository for the new project.

What we’ll end up with is 2 remote repositories which will have a lot in common but won’t be linked to one another in any way. It won’t be a GitHub fork, for instance.

There will be exactly one client side repository that knows about the two central repos and that can exchange commits between them. All other new client-side clones of the new project will be plain old regular Git clones.

It would be easy to set up more client side repositories to be aware of both repositories, but to me it doesn’t really seem necessary.

Article too long?

As with my previous articles about Git, I’ll provide detailed instructions for you to follow along on dummy repositories (if you’re interested). When you return to this article to attempt something similar, you may want to skip to the executive summary section, where I list strictly the important operations without all the rambling.

The instructions with the rambling

Setting up a dummy initial project

Find yourself a comfortable directory and run these few commands to initiate a dummy client-side repository and its corresponding dummy remote:

mkdir test; cd $_
mkdir initial_project initial_wd
GIT_DIR=initial_project/ git init

# Creating a dummy repo
cd initial_wd
git init
echo foo > file.txt
git add .
git commit -a -m "initial commit"
echo bar >> file.txt
git commit -a -m "modification"

# Setting up what's gonna be the central repo for our initial app
git remote add origin ../initial_project/
git push origin master

git config branch.master.remote origin
git config branch.master.merge refs/heads/master

The last 2 config commands configure your master branch to automatically track remote master when issuing the command ‘git pull’. In other words, it’s equivalent to running the command ‘git branch −−track master origin/master’, with the only difference being that ‘branch −−track’ is primarily intended to create a new local branch, whereas here we already have our local branch.

We can now check for the expected behavior:

#   mat@mm initial_wd (master)$ git pull
#   Already up-to-date.

Note also that here I simplify the instructions for following along by creating a dummy remote repository that’s in fact only in another local directory. If for example you wanted to use GitHub, your Git remote command would simply look like:

git remote add origin [email protected]:username/initial_project.git

The actual fork

# Create a directory to host the new remote repository
cd ..
mkdir project2
GIT_DIR=project2/ git init

Now we will use Git clone with the -o option. This lets us give the initial project another name than the default ‘origin’. We’ll want to use the name ‘origin’ for the new repository we’ll create to actually track the new project. I decided to name it the initial project’s origin ‘ip_origin’.

# Specify origin name, then the path to the shared repo
# and finally a directory name for the local working directory.
git clone -o ip_origin initial_project wd

Setting up the relationship with the initial repository

We first create a branch specifically to track the initial project’s master branch. Then we set it up to track the initial project.

cd wd
git branch ip_master
git config branch.ip_master.remote ip_origin
git config branch.ip_master.merge refs/heads/master

Setting up the relationship with the new project’s repository

git remote add origin ../project2
git push origin master

git config branch.master.remote origin
git config branch.master.merge refs/heads/master

Let’s pretend some new development happened

echo 'shareable modification' > shareable_file.txt
git add shareable_file.txt
git commit -m "shareable modification"

echo "specific to new project" > not_shareable.txt
git add not_shareable.txt
git commit -m "specific to new project"

So I have now begun working on the new project and I already have one commit that could benefit the initial project as well. The progress looks a little like this:

A look at gitk’s representation of the new development

So I switch to the branch that manages the relationship with the initial project and I pick the commit before the last one in master.

git checkout ip_master
git cherry-pick master^

Gitk, after cherry-picking one commit

Everything’s dandy so far, except for the subtle fact that I have brought us all to the edge of a cliff.

If I tried to push to the initial repository right now, I’d be in for a nasty surprise:

#   mat@mm wd (ip_master)$ git push
#   Counting objects: 7, done.
#   Compressing objects: 100% (4/4), done.
#   Writing objects: 100% (6/6), 564 bytes, done.
#   Total 6 (delta 1), reused 0 (delta 0)
#   Unpacking objects: 100% (6/6), done.
#   To /Users/mat/blog/long-term-fork/test/initial_project
#      7178a89..3ca0240  master -> master

Oops! By default, ‘git push’ syncs up all branches of the same name with the current branch’s origin. So that would push the new project’s master branch on our initial project’s shared master.

This is obviously not what we want. We only want ip_master to be pushed to the initial project’s master.

Trying to remember to always explicitly run ‘git push ip_origin ip_master:master’ wouldn’t do it for me. To keep the analogy, that would be akin to doing a cartwheel on the edge of said cliff: a lot of fun until you make a mistake. So obviously we’d like a simple ‘git push’ to do the right thing.

So here’s how we configure it:

git config remote.ip_origin.push refs/heads/ip_master:master

Now we can safely issue the ‘git push’ command from ip_master and have Git push ip_master to ip_origin/master.

git push
#   Counting objects: 4, done.
#   Compressing objects: 100% (2/2), done.
#   Writing objects: 100% (3/3), 310 bytes, done.
#   Total 3 (delta 0), reused 0 (delta 0)
#   Unpacking objects: 100% (3/3), done.
#   To /Users/mat/blog/long-term-fork/test/initial_project
#      027a48f..9db6c3f  ip_master -> master

There’s now only one remaining annoyance we’re not yet protected against. If new branches are created in the initial project’s central repository, a ‘git pull’ when standing in the ip_master branch would pull them all in our new project’s working directory. I consider this one only an annoyance, since I could just decide not to pay attention to them. On the other hand they will be distracting and may also clash with the branches I create for the development on my new project. So we want to avoid that behavior as well.

To better understand the current Git configuration, let’s have a look at .git/config:

  ...
  [remote "ip_origin"]
    url = /Users/mat/blog/long-term-fork/test/initial_project
    fetch = +refs/heads/*:refs/remotes/ip_origin/*
  [branch "master"]
    remote = origin
    merge = refs/heads/master
  [branch "ip_master"]
    remote = ip_origin
    merge = refs/heads/master
  [remote "origin"]
    url = ../project2
    fetch = +refs/heads/*:refs/remotes/origin/*

So in both ‘remote’ sections I can see that fetch is configured to bring everything locally (the * wildcards).

So here’s how I limit what gets pulled when I pull from the initial project’s repository.

git config remote.ip_origin.fetch +refs/heads/master:refs/remotes/ip_origin/master

The part before the colon is the name of the interesting branch on the remote server. The part after the colon is your local Git repo’s internal branch, used to track the remote branch (not to be confused with our user branch ip_master).

Setting up remote branches on both projects, pulling, pushing and exchanging more commits is left as an exercise to the reader.

Conclusion

So that’s the gist of it, my friends. I am basically set up to work on my new project like I would in a more typical situation. I also have a special branch set up to interact with the initial project. With this branch I’ll be able to do two things:

  • Pull new developments from the initial project and then cherry-pick only the shareable commits into the new project’s other branches.
  • Cherry-pick in the other direction to bring certain commits from the new project into this branch and then push them up to the initial project.

For this approach to be useful however, we’ll have to make sure we create as concise commits as possible. Gone are the days of committing a whole afternoon in one meaningless commit containing 12 different modifications.

Of course coding sprees of a couple hours are not out of the question. Features like ‘git add −−patch’ are a great help when comes time to extract meaningful commits out of the result of a few hours of intense coding. For a good introduction to −−patch (and a few other powerful features), be sure to read Ryan Tomayko’s The Thing about Git.

The executive summary

So let’s reiterate strictly the interesting bits necessary to set up a long term fork when starting a new project from an existing one.

We already have:

  • the initial project’s repository at url [email protected]:username/initial_project.git
  • the new project’s empty repository, also created at [email protected]:username/new_project.git
# Create new local clone for the new project
git clone -o ip_origin [email protected]:username/initial_project.git new_project
cd new_project

# Set up a standard track between initial master and local ip_master
git config branch.ip_master.remote ip_origin
git config branch.ip_master.merge refs/heads/master

# Automatically push the right branch
git config remote.ip_origin.push refs/heads/ip_master:master
# Don't bring in the other shared branches from initial project
git config remote.ip_origin.fetch +refs/heads/master:refs/remotes/ip_origin/master

# Push new local repo it to new shared repo
git remote add origin [email protected]:username/new_project.git
git push origin master

# Configure standard track of master with new local repo
git config branch.master.remote origin
git config branch.master.merge refs/heads/master

That’s it! We now only have to cherry-pick like there’s no tomorrow.

Comments