14 January 2016

An effective git branching strategy

We have been using git for a number of years now to manage the codebase for all of our client projects, and have over this time gradually refined the branching strategy we use to the point where I think, for our setup (which I would think is probably a pretty common setup for many) it really works well.

Why not git flow?


I'm sure if you are reading this post you have probably come across git flow which is a popular strategy, but one that has always struck me as rather over engineered and actually unworkable for us. I really wanted to like git flow, but the strategy just seems (unless I have totally the wrong end of the stick) to have major flaws when working with new features, mainly revolving around the fact that for normal development - ignoring hotfixes - you only ever branch off develop. Another issue through is that the first time code is pushed remotely from a developers local environment, it should be production ready - because it won't be tested in any remote server environment until the code reaches the master branch.  If it's not production ready then it will hold up future commits and deployments of other features. More often than not this initial push of local code won't be production ready immediately, and the problem with git flow is that there is no support for deploying from multiple branches, in turn supporting multiple testing environments before finally reaching production (yes you can deploy to multiple environments from the same branch but this still holds up production deployments).

Take for example the following scenario.  You develop in a new feature branch (based off the develop branch), the feature is completed and merged back into develop.  Work then starts in another feature branch (based off the updated develop branch), and this also is completed and merged back into develop.  While the second feature is being developed however a bug is discovered with the first feature meaning it's not production ready.  At this point because you have developed the second feature branch based off the updated develop branch (which now contains non production ready code), you can't release either the first or the second feature as the commits for the production ready second feature sit infront of the commits for the non production ready first feature.  Essentially there can be no release of any feature until all features in develop are production ready.

Now this is what the release branches are designed for but realistically the workflow is rarely that cut and dry as to allow you to take a single point in the develop branch where everything is ready to go to production and create a release here.  Also because the bad code still exists in the develop branch all further releases are still held up until a fix has been added to the release branch and the changes merged back into develop.  Realistically  work will always be required in the release branch and because this can only be merged back into the develop and master branches this also means that the feature branch then lacks a complete history of that feature.  Having a complete history is not only extremely useful in the future should you need to work more on that feature, but it also makes the changes made to build that feature far easier to review than they would be in the merged develop or master branches.  This is because unless a merge only contains a single commit you can only review the sum total changes of the merge, not the individual commits which make up that merge.

This is a simplified scenario but with a team of multiple developers/multiple features on the go at any one time you can imagine things can get very messy very quickly with the only options really to fix the situation being:
  • Work on the broken feature until it is production ready holding up all releases until it's fixed.
  • Revert all of the commits for the broken feature so that production can continue.
  • Cherry pick the required commits from develop into the release branch excluding the broken code - but this breaks merging and comparison as cherry picks are totally new commits.  It's also time consuming and open to human error as it's basically a manual merge.  Finally it still holds up further releases as the feature is still broken in develop.
  • Deliberately release the broken code and hotfix it once it's in the master branch.
  • Release anyway but deploy commits out of order skipping over the bad code (which again is open to human error, can produce undesired results further down the line when fixing the deploy order, and means the server is out of sync with the master branch)
None of the above are great options, but the first I would at least consider to be the right option in development terms (but perhaps not in business terms if the fix is not quick).

You could say restrict yourself to only developing one feature branch per release which you could, but however you work it if non production ready code makes it into the develop branch there can't be any releases until that code has been fixed or removed, and if it's not a straightforward issue to resolve, new commits can really pile up in front of the bad feature merge seriously holding up the project.

In the real world you are inevitably from time to time going to get non production ready code in the develop branch however much you test it in your feature branch - that's just the nature of code.  This is particularly true as these feature branches also generally only exist in the developers repo meaning only the developer has tested it until it gets pushed to a remote repo.  This very real scenario is an absolute show stopper when it comes to git flow for us, and I really can't see how it's workable in many git flow setups without majorly holding up production.

Something much simpler

After a fair amount of consideration and testing over time I eventually came to the conclusion that something much simpler than the approach taken by git flow was a far better solution for us.  The approach we use is highly flexible, simple to pick up by any new developer and scalable whatever the size of your development team.  It also does not need any software running on top of git to use - it is in fact somewhat similar to the approach taken by github who also did not find git flow to be a good fit.

First let me explain the setup we use for our client projects.  In all of our projects we have various tools we use to manage the project, including ticketing for the client to assign tasks to us.  The client will typically have at least one live testing environment we will always deploy to before anything is deployed to production (regardless of how trivial the change is).  The client can then test and approve the changes, and once they are happy things are working correctly, the code gets deployed to the next closest environment to production (which could be production).

So for instance, often the client may have a development environment which is a very clean setup without any of the optimisation measures you are probably going to find on production (i.e caching).  After that they might also have a staging environment which is a very close clone of production so that the code can be tested somewhere that has all the optimisation measures (or any other changes for that matter) that production has, but development does not have.  Once code has been tested on staging and approved, you've got about as close as you realistically can to testing the code as it will run on production.  At this point the code can be considered production ready and therefore be deployed to production for customer use.

In other cases the client won't have a complex production setup and so only one development/staging environment is required infront of production.

Our branching strategy

So that's the workflow that we use for our branching strategy, I'm sure nothing I've described above is unfamiliar to you.  So the base idea behind our branching strategy is that all new branches that are developed are based off the production branch that is furthest behind (generally master).  By doing this you can be sure that any new feature you develop in it's own branch can be safely merged into any other branch, including your production branch, and you will only ever add the code for that feature into the branch and nothing else (which is not the case with git flow).  Doing this keeps things very clean and structured from the outset.

So our strategy has the master branch as the branch from which you deploy to production, and everything that is developed is done so in a branch which is based off master.  As well as master you have one additional branch for each live testing environment, so you might have main branches of master, staging and development.  These extra branches start out life also based off the master branch, and along with master are considered the 'main' repository branches.

So the starting point for the repository is the master branch containing up to date production environment code, with one extra branch for each live testing environment in the project.  These extra branches are based off, and are initially identical to the master branch.

From this starting point each developer creates a new branch based off master to work on their particular task.  In our case because we use ticketing we use the ticket number as the branch name and this also makes it quick to visually link an existing branch to a ticket.  If the task needs to be worked on by multiple developers then this new branch can be pushed remotely, but typically a ticket will cover work which can be handled by a single developer and so these branches mostly will only be found in the developers local environment.

Once the developer is happy that the work in that branch is ready for testing, they can merge the changes into the branch furthest from production (so development for the 3 environment setup described above).  Here the changes can then be deployed to the relevant live testing environment for approval - according to your deployment setup for the branch.  Once approved, the branch would then be merged into staging and deployed, and finally merged into master and deployed to production once approved under staging.  This can be happening simultaneously on any number of branches with any number of features and you are never in danger of putting any code onto production that isn't already there as your branch is only ever based off master.  You also have the flexibility to deploy any combination of changes to any combination of testing environments or production.  The only time you should ever get non production ready code in the master branch is if testing in the other live environments and by any developers involved has not revealed a bug, and this is only discovered later.

Typically you will find the most code to be found on the branch furthest from production, which in this case is development.  This is for obvious reasons - all of the code which needs testing will be found in this branch, and it can often be the case that code in this branch stays here for further development for some time before making it to production.  It's also more than possible that code in this branch may never actually reach production at all if a change is scrapped.  This is the reason why we never base branches for new tasks on anything other than the master branch.

Initial setup

So to setup ready to use the branching strategy, first initialise the repo with the production codebase:
git init
git add .
git commit -m 'initial commit'
Then create the extra main branches you need based off master, one for each live testing environment:
git branch staging
git branch development
At this point you should push the repo to your remote git hosting and you are ready to start work on the project.

Ready to work

Each new developer should clone the repo to their local development environment. Once done and they have a task to complete they should create their new branch and work on that:
git checkout -b 43 master
This will create, and switch to a branch named '43' - the branch can be called anything, but this is an example of naming a branch after a ticket number. You then work on and commit the changes for the task and once happy they are ready for testing, merge into the furthest branch from master. A note here - when merging back into any of the 'main' branches (master, staging, development etc) from working on a task like this you should use the --no-ff argument to show the branch you are merging in the repo history.
git checkout development
git merge --no-ff 43
Once merged, push the branch ready for deployment:
git push origin development
This same process should be repeated for each main branch as required so:
git checkout staging
git merge --no-ff 43
git push origin staging
to deploy and test in the staging environment, and finally:
git checkout master
git merge --no-ff 43
git push origin master
when the code is production ready. If a bug is found on production, you just follow the same procedure as normal fixing it in the relevant branch, then merge, push and deploy in the relevant environments. If the fix must go straight to production that's fine, but remember to also merge into the other main branches - the other main branches should always contain everything master does.

If while working on a branch it is decided that the task should be passed to another developer, or that multiple developers should work on it then that branch in the developers local setup can just be pushed to the remote repo. From here all relevant developers can just pull down the branch and work on it.

A side note. As branching is so cheap in git you shouldn't find need to delete your local branches so we generally keep them all - you never know, that old ticket could still re-open. This means that over time the number of branches in your local repo will gradually increase, and if you find need to work on an old branch it could be quite out of date. Here you just need to merge master back in to update the branch (without --no-ff).
git checkout 43
git merge master


Not much to say here except that's it, pretty simple isn't it? Feel free to give this branching strategy a try yourself and if you have any feedback leave it in the comments below.

No comments:

Post a Comment