Sunday, November 30, 2008

Early Branching

Recently a coworker and I were looking at a versioning problem with some code that had been integrated into the current release branch (from some parallel branch) and we stopped and asked ourselves; "Why are these integ issues always so complicated, and why do we always hit them at the end of a release cycle?"

I have had the fortunate experience to work for software companies where the build was a transparent luxury that developers knew almost nothing about (It just worked!), and companies where the build was some Machiavellian Rube Goldberg machine that worked if no one made any mistakes and sage was burned at the right hour on night before a release. The interesting thing is that despite the technologies or languages or platforms that the build system used the one thing that seemed to make the biggest difference was when the branches were cut.

Of all the branching strategies I have come across the one that I've witnessed the greatest success with is early branching. Why does this seem to work better then late branching (or variants of merge/propagate early/often)? I think for a couple key reasons:
  1. Branching is done for clear and coherent reasons. As soon as a release is planned a branch is cut. It ties together clear release requirements to a physical code base from which those documents can be evaluated against at any given moment.
  2. It isolates potentially conflicting parallel work and helps to minimize developer collisions and build downtime
  3. Reduces concurrent branch explosion. (I have seen companies with 9-10 concurrent branches all hoping they can merge them together at the 11th hour and release)
  4. Potentially underestimated tasks (Hey we need to support a new platform!) are identified early and release plans (or requirements) can be adjusted accordingly.
While I think #1 is probably the one that gives developers and project managers the biggest benefit, #4 is where I've seen hours, days, even weeks of time saved. It is self evident that knowing potential problems early in a software cycle is better then late, but it is also surprising how often this is missed because no one foresaw any major problems. Developers are also human, and as opposed to strategies which are more laissez-faire in their version control restrictions (often relying on the wise developer to remember to do all right integing) this approach minimizes mistakes and is more forgiving when mistakes do happen. And if we have learned anything from books like Microserfs or Dreaming in Code, it's that software is hard enough without adding unforgiving process into the mix.

1 comment:

Carter said...

*shill alert*
I'm the "fellow employee" mentioned in the post. I totally agree that early branching is a good approach. From a build and release perspective, it offers these advantages-


1. Developers don't have to rebase their checkouts from branch A to branch B at the appointed branch moment.
Almost every organization that does late branching has a few checkins go to the wrong branch during the week the release is branched.


2. Fewer build infrastructure failures at branch time.
If you late branch, it means that at the end of a release, you try to add new entries to your CI loop, perhaps add new machines to support the new platforms, etc. This is done at the most inconvenient time of the release.


3. Clarity when examining diffs. If you look at the change history for a file in late branching, you have to hopscotch from one branch to another to diff early changes from late changes in the release.


4. Clarity about what work went into what branch.
A year after a release, any memory about when a branch was cut for a release will be a very distant memory. It takes a little archeology to figure what segment on main corresponds to what release. There is no such problem in early branching.