Branching in version control systems: why do it, and what's the benefit?steemCreated with Sketch.

in #steemit7 years ago

What is version control?

A version control system takes its input from developers, who are making changes to the code base. "Version control" is synonymous with "source control." The system saves every change, so that they can be reviewed, backed out, etc. (Generally it just saves the difference or "diff" between two changes, to save disk space.)

Some changes may introduce bugs. It is advantageous to the process of resolving these bugs to know:

  1. Who made the change?
  2. When was the change made?
  3. What other changes were made along with it?
  4. Why was the change made?

A good version control system can answer all of these questions for the bug-fixing developer, even if that developer is not the original developer -- who may be sick, on vacation, or left for greener pastures.

I am most familiar with Perforce and Subversion, but these should apply to most any version control system.

Who made the change?

Source control systems have "users" which may be mapped to an existing user database, such as Windows' Active Directory (AD), or could be created specifically in the source control system itself. In larger organizations, there's benefit to mapping to e.g. AD, since then there's one less "list of users" for the IT department to maintain -- they can just add the source control system's permissions to the user's AD entry. Then when an employee leaves, there's only one account to disable. And, it's more efficient for the employees as well, who have one fewer username/password combination to remember; and, generally, they don't even need to use their password -- if they're logged into their Windows machine via their AD credentials, then they should have seamless access to any other systems that also use AD credentials.

Either way the source control system is configured, it is simple to query it for the change that "broke the build" (or "broke the tests"), and determine which developer checked the change in.

When was the change made?

Similarly, and generally with the same query, one can determine exactly when the change was made. This is helpful to determine for how long something has been broken, especially if it's an "edge case" -- i.e., something that doesn't normally happen, for instance, "how this app behaves when it runs out of disk space" which is always a possibility (this laptop currently has 3.89 GB free out of 442 GB -- so it still has a lot of space free, but if I download another couple Ubuntu ISOs, it will fill up -- I need to do a cleanup soon).

In some cases, especially with "edge case" type issues, it will be difficult to determine when it was actually broken. One way to do this is to keep "old builds" so that a "binary tree search" could be performed to find the one it broke in.

For instance, if we just built build #100 and it had an issue while testing, we could test build #50 -- did that have the issue? If it did, then we know it was checked in prior to that, so keep "halving" until we get to the specific checkin that caused it to behave differently -- i.e., repeat the above with build #25. Conversely, if it didn't have the issue, then we know it happened after build #50, so we'd test build #75, and repeat.

With 100 builds, the worst-case (where the errant functionality was introduced in build #2, and build #1 behaved correctly), it'd test #50, #25, #12, #6, #3, #2, and then #1 to find that the bug was not present in #1, therefore, it must have been checked in as revision #2. That's only seven tests, compared to testing each build going backwards -- that would test 99 broken builds[1] (#99 first, then work backwards through #1). Thus the binary tree search is around 14 times more efficient in this case.

What other changes were made along with it?

In robust source control systems, a developer is able to check multiple changes in "at once", in the same checkin number. This is very useful as some languages like C/C++ have "source files" and "header files." The header files tell the compiler how a function should be called, when the function resides in another module that is later linked in. The source files contain the actual code for the function, and are generally larger. This helps reduce the size of the resulting executable (e.g., by using DLLs, Dynamically Linked Libraries).

If every file was checked in as a separate checkin number, and a function's inputs or outputs needed to change, then the function's "declaration" would be different; for instance, adding an input variable. This change would need to happen to both the source file and the header file. If the header was checked in first, then the build would fail because code that called it would be calling with fewer inputs than it required; and similarly, it the source was checked in first, the build would fail because it would be sending more inputs than the function called for.

In such a system, the build team would come to know that "the build routinely breaks" in situations like these -- but, build breaks always need to be investigated, so it wastes the build team's time (and developers' as well) to look into it and see "oh, the API changed, it'll be fixed next build when the other file is included."

Thus, source control systems evolved the ability to say "check these N files in all at once," so a developer could check in both the source and the header, and never experience a build break when only one or the other change was included.

Why was the change made?

Version control systems have a "comment" when a developer checks in, to specify what they were working on. At companies I've worked at in the past, some developers "felt rushed" and just typed "update" as their comment. This is really, really bad, as it doesn't tell others what the change was about, why it was needed, etc. The build team implemented a checkin test for "comment = 'update'" and rejected it if that was the case. Then, developers started typoing it "udpate" -- and, they were then spoken to by management... :) Not all issues in software development are "technical" issues; some are "political" and are thus better resolved by talking, rather than implementing rules that can be worked around.

Often there will be a bug database as well, for instance, JIRA, Bugzilla, The Bug Genie, etc. It is hugely advantageous to the organization for the developer to include the bug number in their checkin, so that future developers investigating a break in that area can look up the customer's description of the issue in the bug database, and any additional comments by the developers in there, as well as the comments in the source control system.

Okay, what about branches?

A "branch" in version control can be visualized like a tree. The tree (the visible part, anyway) starts at the root, and goes up as a trunk. Then a branch will branch off from the trunk, and more branches will branch off from that, until finally the leaves appear.

In a source control system, the "trunk" can be considered the initial development, and "branches" are different versions of the same source code (e.g., a "1.0" release, "1.5" release, "2.0" release, etc), and the "leaves" are the files. The analogy breaks down a little, since the trunk has leaves (i.e., the files are in the trunk branch), but this should help with understanding.

In Subversion, the root of the source control system will have three folders:

  1. trunk
  2. tags
  3. branches

trunk

The "trunk" would be set up with all the files required to build the project. Builds would happen from the trunk; every build would be tagged (see below), and when it's time to release to customers, a branch would be created for each release (again, see below).

tags

The "tags" would be used every time the build team performs a build; let's say it was at checkin #42, they'd create a tag, e.g. "build_42"; when "synced" to that tag, it would have the exact state of the version control system after checkin #42 was checked in, excluding all checkins after that -- so, at any time, that tag could be synced to and built and it would always produce the same output.

It is possible that the build team doesn't build every checkin, as that could be costly; for instance, if a build took four hours, then it would only be able to build six checkins per day, and with a larger team the builds would start backing up and get to be weeks behind. The build team has two options at this point: add build machines to distribute the load; or, when it comes time to do the next build, just build everything that has been checked in at that point.

With the former, there's an expense involved (more machines, or VMs, and more configuration and maintenance). With the latter, there's also more expense when breaks happen -- all developers that checked in since the previous build would be alerted, and they'd need to determine whether their code broke it. There are ways to make that more efficient, for example the system could determine the module that the break was in and then get the list of developers who checked in to that module, thus causing fewer developers to need to perform an investigation after a build broke.

Efficiency is always a good goal in setting these systems up. The fewer developers that need to be taken away from their ongoing activities, the quicker the upcoming release can be completed. And, perhaps, the developers might not be asked to work nights and weekends just prior to release -- which, to me, is a management failure, not a failure on the devs' parts.

branches

Once the project is stabilized and ready to be put in front of customers -- note that using a testing framework like Gated Migrations will help immensely with ensuring stability prior to letting a customer interact with it! -- and management decides to perform "release 1.0", then a branch is created under "branches" which has all the source files at a particular checkin, to start with.

Developers can then sync to a release branch, and make ongoing changes in there.

The difference between a branch and a tag is: a tag is merely a pointer to a set of files, a "snapshot" in time; the files in a tag cannot be modified (a tag can be updated to point to a different set of files, but that's different). A branch contains a pointer to each file, and if a change is made to a file, it saves the "diff" (difference) of the changes that were made, to the branch, rather than to the trunk (or, more fully: rather than to the original place that it was branched from -- for instance, the trunk could branch to release_1.0, which could branch to release_1.0_SP1, so a change on release_1.0_SP1 would "diff" from release_1.0; etc).

Branches allow for separate development teams; for instance, when I worked at Microsoft, we had a "Sustaining Engineering" team, which was tasked with fixing bugs in released products. They did not write new features. They only fixed bugs. They worked primarily on "release" branches, to fix bugs in released products -- and then, they would "merge" their code changes back up to the branch that their branch was branched from, so that the bug in the released product would also be fixed in the "next release" that's being worked on by the "rock star" developers who add the shiny new features.

Let me know if I can tell you more about this.

This was my career. I'm now disabled from multiple concussions. But I can still write about things that I remember prior to the first one, and am doing this to help future developers.

Enjoy!

[1] -- One of my favorite songs from childhood comes to mind: "The war machine springs to life, opens up one eager eye, focusing it on the sky, as 99 broken builds go by." :)

It also has a reference to my wife @countrylover's Trekkieness, and in fact, they had a karaoke event in the evenings at the 50th Star Trek Convention we went to last year, in which they sang that song -- and the entire audience belted out "EVERYONE'S A CAPTAIN KIRK"! :D




Sort:  

Can I say how much I love you my sweet, smart, funny man? Thank you for indulging me. As much as you fear what you've lost you are still a brilliant mind.

You have a good husband that compliments you ALL the time. Every marriage should be like yours.

He is special man that I am blessed to have in my life.

@libertyteeth this is a great thing for the devs, to save time, money, efforts and get things as better as possible in less time. Thanks for sharing this great advancement. I am following your posts closely from about a week a go. Upvoted. @gold84

when i try to install a programe then always show the version, now it has a clear vission why it necessary, thanks for explanation about the developers workflow.

Thanks! The "version" of the program is usually different from the "versions" of each source file contained in the source control/version control system -- but you are correct, there are reasons for the version number and they're very similar to the reasons used for version control. When a user reports a bug, it's very important to know which release it came from, to better be able to investigate and reproduce the issue. Take care!

its very technical to me

Awesome Post, Keep it up!

Wow @libertyteeth, this write is too technological for me to understand right now. All i can say is that it is a brilliant write up just like the gated migration you talked about in your other post. All i can say also is that it will do steem developers good to notice this write up to improve their system. How is today?

Thanks!

I'm having a lot more difficulty getting to sleep. Hopefully this is temporary as I stopped drinking a week ago.

Ok. Be strong. Peace!

You know liberty, I may not connect with the “code”, but I have certainly come to understand your personality. It would seem you brain is certainly wired and fires a lot quicker compared to other people”s synapses. I don’t want to comment just for the “sake of commenting. No wonder you are so sensitive when it comes to your concussions. The brain is the most valuable asset, and I believe you are healing much faster by writing out such complex info. I fail at mathematics, but can myself “see” that this is your niche in life. Hope it helps the software community, because my synapsis just fried when trying to comprehend this :)

Thanks for putting this out... although I must admit that after a paragraph... I jumped right to the Nena video! I have 0% programming skills and 100% 1980s music listening skills... so it all balances out!

Back when I briefly participated in politics (2012), the Ron Paul folks had an acronym, "WAHOR" -- "We All Have Our Roles". :)

your technology is the best technology & GREAT education

Coin Marketplace

STEEM 0.18
TRX 0.15
JST 0.029
BTC 63702.19
ETH 2490.35
USDT 1.00
SBD 2.67