This is part of a series on good commits.
In this post, we’ll discuss atomicity and frequency. Remember that this series and the talk from which it came is a description of what has worked well for me, not a prescription of what will work well for you. It’s ok to do things differently.
By an atomic commit, I mean there’s just one reason for change included in the commit.
This Commit Is Not Atomic
Fix 4, 5, and 99
4 Adjust font sizes and colors
5 Changed the splash screen timeout from 1 sec to 5 sec
99 Implemented the new ruleset for discounts
4 and 5 are pretty small, so why not include them all in one commit? Here are a few reasons to think about.
- The commit message is not concise. The actual change description comes in the extended details, not the summary.
- None of these can be reverted without reverting the others1.
- None of these changes can be cherry-picked without the others2.
- This works against making frequent commits, which we’ll discuss shortly.
We can see some parallels with the debugging and troubleshooting concept of only turning one knob at a time. If you change three things and it gets better, you don’t know if it was one of them, a combination of two of them (which two?!), or all three working together. If all three are needed, then of course commit them as one cohesive unit. If not, consider keeping them separate3.
I’ve only heard two common arguments against this, and one is the same “habits” reason we’ve covered in almost every topic in the series.
The other is that the log gets much longer, and that’s certainly true. Whether this is a problem is another matter. I find that it’s often helpful to see a greater number of more granular commits in the log than to see a few gigantic ones. It’s more clear to me what changed when and why, especially since the message are more concise and specific. By the same token, it’s easier to search for a given commit.
There is one more option to keep in mind as you consider what works best for you. With some version control systems and branching strategies, you can have the best of both worlds. When the full history becomes irrelevant as a new feature reaches completion (i.e., in the future, we’ll only care that the feature was implemented, not about each step we took along the way), then you can squash the Work In Progress commits into a smaller set just before you merge them in.
If you’re making atomic commits, you’re probably also committing frequently.
It’s like undo/redo, but
– with named states
– across files
– without loss of the undo stack when the IDE or system restarts
– and you can jump back and forth multiple points at a time
It’s like saving your game right before the boss fight
I’ll usually commit each time I make forward progress toward my goal, or whenever I’m about to make a significant change across multiple files. This might mean I get one more test passing, or it might mean I’ve created something that “works” but needs to be refactored. The value in the commit is that no matter how bad I break things during the next step, there is zero effort to put things right. If I find myself down a terrible path, I just reset to HEAD, and I’m safe at home.
Depending on your branching strategy, you might even push your commits frequently so that you can get early feedback (from your peers and/or a build server). Contrast this with waiting until work is done, when feedback will often be withheld or ignored “to avoid the cost of rework”4.
So how frequently?
It depends ;) I certainly don’t think of it in terms of time. I think of it more in terms of progress versus risk. Do you have more value than you had before? Do you want to protect it as a known good state to which you can time travel later? Are you about to experiment with a wild idea or undergo a large refactoring? Consider whether there’s value in giving yourself a checkpoint, especially if your VCS supports squashing it later if you realize later that you didn’t need it.
A lot of people worry that this will cause destabilization.
This is a very valid concern if committing means that it affects the whole team instead of just you5. If your commits only live locally until you push them to the server, or if your pushed commits will be isolated in a topic branch, then you’re only affecting yourself.
If the commits (or check-ins) are into a shared mainline, then yes, frequently adding your half-baked work in progress can indeed break builds and impact your team. However, there are ways around breaking your team and benefits to earlier integration. You will have to integrate at some point, and delaying that will only make it harder at the end. Do a check-in dance. Find out earlier what’s going to break, and it’ll be easier to correct your course before getting too far down the wrong path6.
Again, this is what has worked for me and my teams. It might not be best for everyone. It might not even be best for me…just the best I’ve found yet. Please do share the pros and cons of any alternatives that have worked well for you.
- To be clear, you could manually revert them piece by piece by paying careful attention to what changes went with what commit. I’m talking about a quick, automatic revert, as when using
- To cherry-pick or not to cherry-pick is another topic. All I’m saying here is that if you were to cherry-pick, three changes come with that commit.
- If you’ve already made several changes before realizing they should be separate, can look at
git add -pto selective stage and commit portions of changed files instead of all of them. The danger would be that if you separate things that actually needed to be together, one of the commits might actually be unbuildable. When (if ever) that’s acceptable is another topic.
- Transitioning from code reviews when work is considered complete to an ongoing, collaborative discussion as soon as work begins is incredibly powerful.
- And your pair(s).
- If you are isolating work in topic branches, you’re guarding against destabilization of the mainline, but you’re opening yourself up to the pain of delayed integration. You don’t see conflicts or incompatible semantic changes until you merge. One solution is to regularly incorporate the mainline into your branch (e.g., rebasing or merging master) to see and resolve these problems earlier. Of course, this doesn’t solve the case where Topic A and Topic B are both compatible with master, but not with each other. We will talk more about these tradeoffs in a future post about branching strategies.