Tuesday, May 26, 2015

Git 2.4.1 and 2.4.2

Today, the v2.4.2 maintenance release was tagged. Compared to v2.4.0 that was released end of April 2015 (i.e. last month), in addition to minor typo-fixes, documentation updates and trivial code clean-ups, today's maintenance release contains the following:
  • The usual git diff, when seeing a file turning into a directory, showed a patchset to remove the file and create all files in the directory, but git diff --no-index simply refused to work.  Also, when asked to compare a file and a directory, imitate POSIX diff and compare the file with the file with the same name in the directory, instead of refusing to run.
  • The default $HOME/.gitconfig file created upon git config --global that edits it had incorrectly spelled user.name and user.email entries in it.
  • git commit --date=now or anything that relies on approxidate lost the daylight-saving-time offset.
  • git cat-file bl $blob failed to barf even though there is no object type that is "bl".
  • Teach the codepaths that read .gitignore and .gitattributes files that these files encoded in UTF-8 may have UTF-8 BOM marker at the beginning; this makes it in line with what we do for configuration files already.
  • Access to objects in repositories that borrow from another one on a slow NFS server unnecessarily got more expensive due to recent code becoming more cautious in a naive way not to lose objects to pruning.
  • We avoid setting core.worktree when the repository location is the .git directory directly at the top level of the working tree, but the code misdetected the case in which the working tree is at the root level of the filesystem (which arguably is a silly thing to do, but still valid).
  • git rev-list --objects $old --not --all to see if everything that is reachable from $old is already connected to the existing refs was very inefficient.
  • hash-object --literally introduced in v2.2 was not prepared to take a really long object type name.
  • git rebase --quiet was not quite quiet when there is nothing to do.
  • The completion for log --decorate= parameter value was incorrect.
  • filter-branch corrupted commit log message that ends with an incomplete line on platforms with some sed implementations that munge such a line.  Work it around by avoiding to use sed.
  • git daemon failed to build from the source under NO_IPV6 configuration (regression in 2.4).
  • git stash pop/apply forgot to make sure that not just the working tree is clean but also the index is clean. The latter is important as a stash application can conflict and the index will be used for conflict resolution.
  • We have prepended $GIT_EXEC_PATH and the path git is installed in (typically /usr/bin) to $PATH when invoking subprograms and hooks for almost eternity, but the original use case the latter tried to support was semi-bogus (i.e. install git as /opt/foo/git and run it without having /opt/foo on $PATH), and more importantly it has become less and less relevant as Git grew more mainstream (i.e. the users would want to have it on their $PATH).  Stop prepending the path in which git is installed to users' $PATH, as that would interfere the command search order people depend on (e.g. they may not like versions of programs that are unrelated to Git in /usr/bin and want to override them by having different ones in /usr/local/bin and have the latter directory earlier in their $PATH).
Git hopefully continues to improve.
Have fun.

Saturday, April 25, 2015

Fun with failing cherry-pick

I just encountered an interesting cherry-pick failure.

The change I was trying to cherry-pick was to remove a hunk of text. Its patch conceptually looked like this:

@@ ... @@
 A
-B
 C

even though the pre-context A, removed text B, and post-context C are all multi-line block.
After doing a significant rewrite to the same original codebase (i.e. that had A, B and then C next to each other), the code I wanted to cherry-pick the above commit moved the text around and the block corresponding to B is now done a lot later. A diff between that state and the original perhaps looked like this:

@@ ... @@
 A
-B
 C
@@ ... @@
 D
+B
 E

And cherry-picking the above change succeeded without doing anything (!?!?).

Logically, this behaviour "makes sense", in the sense that it can be explained. The change wants to make A and C adjacent by removing B, and the three-way merge noticed that the updated codebase already had that removal, so there is nothing that needs to be done. In this particular case, I did not remove B but moved it elsewhere, so what cherry-pick did was wrong, but in other cases I may indeed have removed it without adding the equivalent to anywhere else, so it could have been correct. We simply cannot say. I wonder if we should at least flag this "both sides appear to have removed" case as conflicting, but I am not sure how that should be implemented (let alone implemented efficiently). After all, the moved block B might have gone to a completely different file. Would we scan for the matching block of text for the entire working tree?

This is why you should always look at the output from "git show" for the commit being cherry-picked and the output from "git diff HEAD" before concluding the cherry-pick to see if anything is amiss.


Thursday, April 2, 2015

First release candidate for Git 2.4

This release has a few changes in the user-visible output from Porcelain commands. These are not meant to be parsed by scripts, but the users still may want to be aware of the changes.
  • Output from "git log --decorate" (and "%d" format specifier used in the userformat "--format=<string>" parameter "git log" family of commands take) used to list "HEAD" just like other branch names, separated with a comma in between. E.g.

         $ git log --decorate -1 master
         commit bdb0f6788fa5e3cacc4315e9ff318a27b2676ff4 (HEAD, master)
         ...
    This release updates the output slightly when HEAD refers to the tip of a branch whose name is also shown in the output.  The above is shown as:

         $ git log --decorate -1 master
         commit bdb0f6788fa5e3cacc4315e9ff318a27b2676ff4 (HEAD -> master)
         ...

  • The phrasing "git branch" uses to describe a detached HEAD has been updated to match that of "git status". When the HEAD is at the same commit as it was originally detached, they now both show "detached at <commit object name>". When the HEAD has moved since it was originally detached, they now both show "detached from <commit object name>". Earlier "git branch" always used "from", even when the user hasn't moved HEAD since it was detached.
Otherwise, there are only minor fixes and documentation updates everywhere, and unusually low number of new and shiny toys ;-)
  • "git log --invert-grep --grep=WIP" will show only commits that do not have the string "WIP" in their messages.
  • "git push" has been taught a "--atomic" option that makes push to update more than one ref an "all-or-none" affair.
  • Extending the "push to deploy" added in 2.3, the behaviour of "git push" when updating the branch that is checked out can now be tweaked by push-to-checkout hook. The "push to deploy" implementation in 2.3 has a bug that makes it impossible to bootstrap an empty repository (or an unborn branch), but it can be worked around by using this hook.
  • "git send-email" used to accept a mistaken "y" (or "yes") as an answer to "What encoding do you want to use [UTF-8]? " without questioning.  Now it asks for confirmation when the answer looks too short to be a valid encoding name.
  • "git archive" can now be told to set the 'text' attribute in the resulting zip archive.
  • "git -C '' subcmd" used to refuse to work in the current directory, unlike "cd ''" which silently behaves as a no-op.
  • The versionsort.prerelease configuration variable can be used to specify that v1.0-pre1 comes before v1.0.
  • A new "push.followTags" configuration turns the "--follow-tags" option on by default for the "git push" command.
Please give it a good beating so that we can ship a successful v2.4 final at around the end of the month without regressions compared to v2.3 series.

Thanks.

Monday, March 30, 2015

Fun with Non-Fast-Forward

Your push may fail due to “non fast-forward”. You start from a history that is identical to that of your upstream, commit your work on top of it, and then by the time you attempt to push it back, the upstream may have advanced because somebody else was also working on his own changes.


For example, between the upstream and your repositories, histories may diverge this way (the asterisk denotes the tip of the branch; the time flows from left to right as usual):


Upstream                                You


---A---B---C*      --- fetch -->        ---A---B---C*


                                                    D*
                                                   /
---A---B---C---E*                       ---A---B---C


            D?                                      D*
           /                                       /
---A---B---C---E?   <-- push ---        ---A---B---C


If the push moved the branch at the upstream to point at your commit, you will be discarding other people’s work. To avoid doing so, git push fails with “Non fast-forward”.


The standard recommendation when this happens is to “fetch, merge and then push back”. The histories will diverge and then converge like this:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       1
                                                    D---F*
                                                   /   /2
---A---B---C---E*                       ---A---B---C---E


               1                                       1
            D---F*                                  D---F*
           /   /2                                  /   /2
---A---B---C---E    <-- push ---        ---A---B---C---E


Now, the updated tip of the branch has the previous tip of the upstream (E) as its parent, so the overall history does not lose other people’s work.


The resulting history, however, is not what the majority of the project participants would appreciate. The merge result records D as its first parent (denoted with 1 on the edge to the parent), as if what happened on the upstream (E) were done as a side branch while F was being prepared and pushed back. In reality, E in the illustration may not be a single commit but can be many commits and many merges done by many people, and these many commits may have been observed as the tips of the upstream’s history by many people before F got pushed.

Even though Git treats all parents of a merge equally at the level of the underlying data model, the users have come to expect that the history they will see by following the first-parent chain tells the overall picture of the shared project history, while second and later parents of merges represent work done on side branches. From this point of view, what "fetch, merge and then push" is not quite a right suggestion to proceed from a failed push due to "non fast-forward".


It is tempting to recommend “fetch, merge backwards and then push back” as an alternative, and it almost works for a simple history:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       2
                                                    D---F*
                                                   /   /1
---A---B---C---E*                       ---A---B---C---E


               2                                       2
            D---F*                                  D---F*
           /   /1                                  /   /1
---A---B---C---E    <-- push ---        ---A---B---C---E


Then, if you follow the first-parent chain of the history, you will see how the tip of the overall project progressed. This is an improvement over the “fetch, merge and then push back”, but it has a few problems.


One reason why “merge backwards” is wrong becomes apparent when you consider what should happen when the push fails for the second time after the backward merge is made:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       2
                                                    D---F*
                                                   /   /1
---A---B---C---E*                       ---A---B---C---E


               2                                       2
            D---F?                                  D---F*
           /   /1                                  /   /1
---A---B---C---E---G    <-- push ---    ---A---B---C---E


               2                                       2   2
            D---F?                                  D---F---H*
           /   /1                                  /   /1  /1
---A---B---C---E---G    --- fetch -->   ---A---B---C---E---G


If the upstream side gained another commit G while F was being prepared, “fetch, merge backwards and then push” will end up creating a history like this, hiding D, the only real change you did in the repository, as the tip of the side branch of a side branch!

It also does not solve the problem if the work you did in D is not a single strand of pearls, but has merges from side branches. If D in the above series of illustrations were a few merges X, Y and Z from side branches of independent topics, the picture on your side, after fetching E from the updated upstream, may look like this:


    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z*
   /       /
---A---B---C---E


That is, hoping that the other people will stay quiet, starting from C, you merged three independent topic branches on top of it with merges X, Y and Z, and hoped that the overall project history would fast-forward to Z. From your perspective, you wanted to make A-B-C-X-Y-Z to be the main history of the project, while x, y, ... were implementation details of X, Y and Z that are hidden behind merges on side branches. And if there were no E, that would indeed have been the overall project history people would have seen after your push.

Merging backwards and pushing back would however make the history’s tip F, with its first parent E, and Z becomes a side branch. The fact that X, Y and Z (more precisely, X^2 and Y^2 and Z^2) were independent topics is lost by doing so:


    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z
   /       /         \2
---A---B---C---E-------F*
                     1



So "merge backwards" is not a right solution in general. It is only valid if you are building a topic directly on top of the shared integration branch, which is something you should not be doing in the first place. In the earlier illustration of creating a single D on top of C and pushing it, if there were no work from other people (i.e. E), the push would have fast-forwarded, making D as a normal commit directly on the first-parent chain. If there were work from other people like E, “merge in reverse” would instead have recorded D on a side branch. If D is a topic separate and independent from other work being done in parallel, you would consistently want to see such a change appear as a merge of a side branch.

A better recommendation might be to “fetch, rebuild the first-parent chain, and then push back”. That is, you would rebuild X, Y and Z (i.e. “git log --first-parent C..”) on top of the updated upstream E:


    y---y-------y   .
   /             \   .
  .   x-------x   \   \
 .   /         \   \   \
.   /           X’--Y’--Z’*
   /           /
---A---B---C---E


Note that this will work well naturally even when your first-parent chain has non-merge commits. For example, X and Y in the above illustration may be merges while Z is a regular commit that updates the release notes with descriptions of what was recently merged (i.e. X and Y). Rebuilding such a first-parent chain on top of E will make the resulting history very easy to understand when the reader follows the first-parent chain.

The reason why “rebuild the first-parent chain on the updated upstream” works the best is tautological. People do care about the first-parenthood when viewing the history, and you must have cared about the first-parent chain, too, when building your history leading to Z. That first-parenthood you and others care about is what is being preserved here. By definition, we cannot go wrong ;-)

And of course, this will work against a moving upstream that gained new commits while we were fixing things up on our end, because we won't be piling a new merges on top, but will be rebuilding X', Y' and Z' into X'', Y'', and Z'' instead.

To make this work on the pusher’s end, after seeing the initial “non fast-forward” refusal from “git push”, the pusher may need to do something like this:


$ git push ;# fails
$ git fetch
$ git rebase --first-parent @{upstream}


Note that “git rebase --first-parent” does not exist yet; it is one of the topics I would like to see resurrected from old discussions.

But before "rebase --first-parent" materialises, in the scenario illustrated above, the pusher can do these instead of that command:


$ git reset --hard @{upstream}
$ git merge X^2
$ git merge Y^2
$ git merge Z^2


And then, inspect the result thoroughly. As carefully as you checked your work before you attempted your first push that was rejected. After that, hopefully your history will fast-forward the upstream and everybody will be happy.


Friday, March 27, 2015

Stats from recent Git releases

Following up to the previous post, I computed a few numbers for each development cycle in the recent past.

In all the graphs in this article, the horizontal axis counts the number of days into the development cycle, and the vertical axis shows the number of non-merge commits made.

  • The bottom line in each graph shows the number of non-merge commits that went to the contemporary maintenance track.
  • The middle line shows the number of non-merge commits that went to the release but not to the maintenance track (i.e. shiny new toys, oops-fixes to them, and clean-ups that were too minor to be worth merging to the maintenance track), and
  • The top line shows the total number of non-merge commits in the release.

Even though I somehow have a fond memory of v1.5.3, the beginning of the modern Git was unarguably the v1.6.0 release. Its development cycle started in June 2008 and ended in August 2008. We can see that we were constantly adding a lot more new shiny toys (this cycle had the big "no more git-foo in user's $PATH" change) than we were applying fixes to the maintenance track during this period. This cycle lasted for 60 days, 731 commits in total among which 120 went to the maintenance track of its time.


During the development cycle that led to v1.8.0 (August 2012 to October 2012), the pattern is very different. We cook our topics longer in the 'next' branch and we can clearly see that the topics graduate to 'master' in batches, which appear as jumps in the graph.  This cycle lasted for 63 days, 497 commits in total among which 182 went to the maintenance track of its time.


The cycle led to v2.0.0 (February 2014 to June 2014) has a similar pattern, but as another "we now break backward compatibility for ancient UI wart" release, we can see that a large batch of changes were merged in early part of the cycle, hoping to give them better and longer exposure to the testing public; on the other hand, we did not do too many fixes to the maintenance track.  This cycle lasted for 103 days, 475 commits in total among which 90 went to the maintenance track of its time.



The numbers for the current cycle leading to v2.4 (February 2015 to April 2015) are not finalized yet, but we can clearly see that this cycle is more about fixing old bugs than introducing shiny new toys from this graph.  This cycle as of this writing is at its 50th day, 344 commits in total so far among which 115 went to the maintenance track.



Note that we should not be alarmed by the sharp rise at the end of the graph. We just entered the pre-release freeze period and the jump shows the final batch of topics graduating to the 'master' branch. We will have a few more weeks until the final, and during that period the graph will hopefully stay reasonably flat (any rise from this point on would mean we would be doing a last-minute "oops" fixes).

Thursday, March 26, 2015

Git 2.4 will hopefully be a "product quality" release

Earlier in the day, an early preview release for the next release of Git, 2.4-rc0, was tagged. Unlike many major releases in the past, this development cycle turned out to be relatively calm, fixing many usability warts and bugs, while introducing only a few new shiny toys.

In fact, the ratio of changes that are fixes and clean-ups in this release is unusually higher compared to recent releases. We keep a series of patches around each topic, whether it is a bugfix, a clean-up, or a new shiny toy, on its own topic branch, and each branch is merged to the 'master' branch after reviewing and testing, and then fixes and trivial clean-ups are also merged to the 'maint' branch. Because of this project structure, it is relatively easy to sift fixes and enhancement apart. Among new commits in release X since release (X-1), the ones that appear also in the last maintenance track for release (X-1) are fixes and clean-ups, while the remainder is enhancements.

Among the changes that went into v1.9.0 since v1.8.5, 23% of them were fixes that got merged to v1.8.5.6, for example, and this number has been more or less stable throughout the last year. Among the changes in v2.3.0 since v2.2.0, 18% of them were also in v2.2.2. Today's preview v2.4.0-rc0, however, has 333 changes since v2.3.0, among which 110 are in v2.3.4, which means that 33% of the changes are fixes and clean-ups.

These fixes came from 33 contributors in total, but changes from only a few usual suspects dominate and most other contributors have only one or two changes on the maintenance track. It is illuminating to compare the output between

$ git shortlog --no-merges -n -s ^maint v2.3.0..master
$ git shortlog --no-merges -n -s v2.3.0..maint

to see who prefers to work on new shiny toys and who works on product quality by fixing other people's bugs. The first command sorts the contributors by the number of commits since v2.3.0 that are only in the 'master', i.e. new shiny toys, and the second command sorts the contributors by the number of commits since v2.3.0 that are in the 'maint', i.e. fixes and clean-ups.

The output matches my perception (as the project maintainer, I at least look at, if not read carefully, all the changes) of each contributor's strength and weakness fairly well. Some are always looking for new and exciting things while being bad at tying loose ends, while others are more careful perfectionists.

Wednesday, March 25, 2015

Git Rev News

Christian Couder (who is known for his work enhancing the "git bisect" command several years ago) and Thomas Ferris Nicolaisen (who hosts a popular podcast GitMinutes) started producing a newsletter for Git development community and named it Git Rev News.

Here is what the newsletter is about in their words:

Our goal is to aggregate and communicate some of the activities on the Git mailing list in a format that the wider tech community can follow and understand. In addition, we'll link to some of the interesting Git-related articles, tools and projects we come across.

This edition covers what happened during the month of March 2015.

As one of the people who still remembers "Git Traffic", which was meant to be an ongoing summary of the Git mailing list traffic but disappeared after publishing its first and only issue, I find this a very welcome development. Because our mailing list is a fairly high-volume one, it is almost impossible to keep up with everything that happens there, unless you are actively involved in the development process.

I hope their effort will continue and benefit the wider Git ecosystem. You can help them out in various ways if you are interested.

  • They are not entirely happy with how the newsletter is formatted. If you are handy with HTML, CSS or some blog publishing platforms, they would appreciate help in this area.
  • They are not paid full-time editors but doing this as volunteers. They would appreciate editorial help as well.
  • You can contribute by writing your own articles that summarize the discussions you found interesting on the mailing list.

Friday, February 27, 2015

Nexus 4 still live and kicking

My everyday phone for the past few years has been Nexus 4. I also have a Nexus 5, which is slightly larger and with a much better screen, but I never felt a need to switch (I did try to have "Let's use N5 this week" every once in a while, though). Last year's Nexus 6 simply felt too large for me. Besides, it is too expensive for me to buy.

I noticed that my N4 recently stopped picking up NFC and charging via wireless. Later I learned that this is a typical sign that its battery needs replacement. Not because the battery got too weak to hold charge, but because the battery started bloating, pushing against the back cover, which necessary antennas are built onto. By slightly raising the back cover by bulging out, the bloated battery breaks the connection from the motherboard to these antennas, which is made only by contact. And that is how NFC and wireless charging are broken.

At least, that is the story I read.

After learning how to, and getting a replacement battery and a few small screw/torx drivers, I opened the phone (which took me some time) and saw this bloated battery.







No wonder the back cover looked warped. After placing the new battery and closing the back, NFC started picking up very reliably and it charges properly on a wireless charger.

Happy ;-)

Sunday, February 8, 2015

Fun with "git diff -B -M"

Git lets you view a change that renames an original file A to a new location B while doing some minor edits to its contents as a "rename" patch (i.e. "rename A to B, with the following content differences"). You can even view a change that renames two original files, A and B, by swapping their contents and optionally doing some minor edits to them as a patchset that contains two "rename" patches ("rename A to B" and "rename B to A"). These were invented by me early in Git's life, when Linus was still running the project, back in mid 2005. More recently, other tools (including GNU patch) started understanding patches that use these features.

I however recently noticed a few corner cases that git diff and friends produce a wrong patchset, or git apply fails to apply correctly constructed patches, and I have been thinking about the right fixes to these issues. This article will illustrate these tricky cases and describe my current thinking.

In this write-up, I'll use these terms:

patchset::
Output from a single git diff invocation, which may contain one or more patch.

patch::
A part of a patchset from a header line that begins with "diff --git" up to (but excluding) the next such header line.

tree::
git diff compares two collections of files; each collection is a tree. Left and right sides of the comparison are called old tree and new tree, respectively. The tree to which we attempt to apply a patchset is the target tree. The tree we would get after a successful application of a patchset is the resulting tree. A tree does not have to be a tree object—we may be comparing the index and the files in the working tree, for example.

preimage::
postimage::
The preimage consists of lines in a patch that are prefixed by "-" (minus) or " " (space) but not "+" (plus) that denote what the patched file ought to have for the patch to apply. The postimage consists of lines that are prefixed by "+" (plus) or " " (space) that denote what the patched result ought to look like.


1. Basics

First the basics. Let's think about a patchset with a single patch. What does this patchset tell us?

    diff --git a/major-08.txt b/major-08.txt
    index 680c5f6..5de90cb 100644
    --- a/major-08.txt
    +++ b/major-08.txt
    @@ -1,3 +1,3 @@
    -8. Fortitude.
    +8. Strength.

     This is one of the cardinal virtues, of which I shall speak later.

It obviously tells us that the new tree changed "Fortitude", that used to be in the old tree, to "Strength", but it actually tells us a bit more about the old tree. For this patchset to apply, the target tree must have a file "major-08.txt" that begins with lines we see as the preimage in the patch.


2. Renaming a file

Now let's get a bit fancier and study a patchset with a rename patch. What does this patchset tell us?

    diff --git rws/major-08.txt marseille/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 680c5f6..2ab22a0 100644
    --- rws/major-08.txt
    +++ marseille/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Fortitude.
    +11. Fortitude.

     This is one of the cardinal virtues, of which I shall speak later.

We can see that this is going from the same old tree as the previous one's old tree, renames major-08 to major-11 with slight modification.

It tells us more about the trees, compared to the previous example. For this patchset to apply, the target tree must satisfy the same pre-conditon as the previous one about major-08, and in addition it must lack major-11; otherwise we wouldn't be renaming a new file to it.

So far, things are straight-forward.

In summary:

Rule 1.
A patch from file A to file A requires that file A exists in the target tree with contents that match the preimage of the patch.

Rule 2.
A patch renaming file A to file B requires that file A exists the target tree with contents that match the preimage of the patch. It also requires that file B must not exist in the target tree.

Rule 3.
A patch that creates file A requires that file A does not exist in the target tree.

Rule 4.
A patch that deletes file A requires that file A exists in the target tree with contents that match the preimage of the patch.

The latter two I didn't illustrate with examples, but they should be obvious. Also, we can think of Rule 2 (rename) as a natural extension of Rule 3 (creation) and part of Rule 4 (deletion). When you rename file A to file B, optionally with some content changes, you are:

  • creating file B, so the target tree must not have file B already.
  • deleting file A, so the target tree must have file A with the content that matches (part of) it.
Similarly, a patch that creates file B by copying file A, optionally with some content changes, you are creating file B, so the target tree must not have file B already. Also, the target tree must have file A with the contents that match the preimage of the patch.



3. First twist: cross renaming

Now, here is the first twist. What does this patchset mean?

    diff --git rws/major-11.txt marseille/major-08.txt
    similarity index 99%
    rename from major-11.txt
    rename to major-08.txt
    index 517d9f8..44e8d3a 100644
    --- rws/major-11.txt
    +++ marseille/major-08.txt
    @@ -1,3 +1,3 @@
    -11. Justice.
    +8. La Justice

     That the Tarot, though it is of all reasonable antiquity, is not of
    diff --git rws/major-08.txt marseille/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 5de90cb..a101d5f 100644
    --- rws/major-08.txt
    +++ marseille/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.

This is a "swap" patchset, that swaps major-08 and major-11 with small edit. You would have done something like this to prepare such a change, starting from an old tree with two files with substantially different contents, both of which are of meaningful sizes:

    $ mv major-11.txt tmp
    $ mv major-08.txt major-11.txt
    $ mv tmp major-11.txt
    $ edit major-08.txt major-11.txt ;# just a bit
    $ git commit -m swap major-08.txt major-11.txt
    $ git diff -B -M HEAD^

A patch renaming major-11 to major-08 (i.e. the first one in this two-patch patchset) still requires that major-11 must exist in the target tree for the patchset to apply, which is the first half of Rule 2.

But the other half of Rule 2 is not satisfied. The target of the rename, major-08, has to exist in the target tree; otherwise we cannot rename it to major-11 in the second patch in the patchset. The rule needs a bit of revising, perhaps like this:

Rule 2.
A patch renaming file A to file B requires that file A exists with contents that match its preimage. And file B must not exist in the target tree, unless another patch in the patchset renames file B to some other file (possibly but not necessarily file A).

Of course, for such an "other patch" to be able to rename file B to somewhere else, the target tree is required to have file B.

It is important to have that "unless" part in the revised Rule 2. We need to make sure that we do not allow the sample patchset in "2. Renaming a file" to overwrite an existing file major-11 in the target tree blindly.

4. Second twist: rewriting and copying

The previous one showed how git diff -B -M can be used to detect cross renaming files and apply the resulting patchset (you can circularly rename more than two, i.e. A -> tmp, B -> A, ..., Z -> Y, tmp -> Z). It can also detect when you did this:

    $ cp major-08.txt major-11.txt
    $ edit major-08.txt ;# extensively
    $ git add major-08.txt major-11.txt
    $ git commit -m 'create 11 out of 08, rewrite 08'
    $ git diff -B -M HEAD^

And you would see:

    diff --git a/major-08.txt b/major-08.txt
    dissimilarity index 99%
    index 5de90cb..44e8d3a 100644
    --- a/major-08.txt
    +++ b/major-08.txt
    @@ -1,10 +1,31 @@
    -8. Strength.
    -
    -This is one of the cardinal virtues, of which I shall speak later.
    -...
    -the principle of all force.
    +8. La Justice
    +
    +That the Tarot, though it is of all reasonable antiquity, is not of
    +...
    +via prudentiæ.
    diff --git a/major-08.txt b/major-11.txt
    similarity index 97%
    copy from major-08.txt
    copy to major-11.txt
    index 5de90cb..a101d5f 100644
    --- a/major-08.txt
    +++ b/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.

The first patch in the patchset is "a patch from file A to file A", even though it is an extensive rewrite. The target tree is required to have major-08 whose contents match the preimage of the patch (Rule 1). The second patch copies from major-08 to create a new file major-11. The target tree is required to lack major-11 (Rule 3; copying into A is creation of A). It also must have major-08 that begins with the preimage of the patch.

Another thing to note is that an application of a patchset in Git is not incremental. Even though the first patch in the patchset talks about extensively rewriting major-08, and the second patch talks about creating major-11 by copying major-08 and then making a minor edit to it, the latter patch is the difference between the major-08 in the old tree and the major-11 in the new tree. It is not the difference between these two files in the new tree, i.e. it is not "modify major-08 and then copy the result to major-11 and then edit". If you think about it, this is also consistent with the previous "cross renaming" section. The first patch in the patchset renames major-11 to major-08, and the second patch that renames major-08 to major-11 is not about remaing the file that originally was major-11 that the first patch renamed back to its original position. The two patches are not applied incrementally (or sequentially).

So far, all the examples shown above will work correctly with today's Git (some reimplementations of Git may lack support, but at least the one I maintain does work correctly). When you use the old tree as the target tree, git apply accepts the patchset and recreates the new tree correctly.

But if you use the new tree of this example as the target tree and try to use git apply -R to apply the patchset in reverse, it does not work correctly. It is a bug.

Currently git apply -R does a nonsense for a copying patch. To reverse any patch, it just swaps the preimage and the postimage, and then swaps the names of the files in the old tree and in the new tree.

But the reverse of "create major-11 by copying major-08 into it and then change Strength to La Force" (which is the second patch in the patchset in this section) is not "create major-08 by copying major-11 into it and then change La Force to Strength", which you would get by simply swapping the preimage and the postimage and swapping the names of the files in the second patch.

What should we do to "reverse" a patchset that has copies?

Reverse of "create major-11 by copying major-08" should at least be "remove major-11", and preferably accompanied by "while making sure that major-11 matches the postimage of the patch".

The "preferably" part is a moderately strong preference. When the copying was done without any modification, we would not have any preimage or postimage to enable us to check that the target tree of the reverse application is similar enough to the new tree the patchset was taken from. Instead, we would end up just checking "major-11 exists" and then removing it happily, even if the contents of the file major-11 is vastly different from that of the new tree the patchset was taken from, which feels somewhat unsafe.

Admittedly, the same "it feels unsafe" factor exists when applying a bog-standard pure rename patch (imagine that the example in "2. Renaming a file" was done without editing the first line and kept the original "8. Fortutide." without renumbering it. We would not have any preimage we can use to make sure we are patching the correct file).

But as long as we have patch text that we can use for sanity checking, we should use it, I would think.


5. Third twist: rewriting by copying

If you started from two vastly different files, both of which have contents of meaningful size, and did this:

    $ cp major-08.txt major-11.txt
    $ edit major-11.txt
    $ rm major-08.txt
    $ git commit -m 'rewrite 11 by copying 08' major-08.txt major-11.txt
    $ git diff -B -M HEAD^

You would see this patchset:

    diff --git a/major-08.txt b/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 5de90cb..a101d5f 100644
    --- a/major-08.txt
    +++ b/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.

This is another bug. I sent out a warning to both the Git and the Linux kernel mailing list, not to use the "-B -M" options together for this reason.

The revised Rule 2. from "3. First twist" tells us that major-08 must exist in the target tree, which is OK, but also major-11, the target of the rename, must not exist. This makes the resulting patchset unapplicable to the old tree the patchset was taken from, which simply does not make sense.

If you take a diff between states X and Y, you should be able to apply that diff to the state X and the resulting state should be identical to the state Y, and you should be able to apply that diff in reverse to state Y to go back to the state X.

Worse, the reverse of this patchset would apply to the new tree without an error, but does not reproduce the old tree correctly, which is a more serious bug. It instead applies the patch in reverse and recreates the original major-08, but the other file, major-11, is lost.

The patchset does not have enough information for us to recreate its original contents of major-11 we had in the old tree. The patchset says that the contents of major-11 in the new tree came from the contents of major-08 in the old tree, and the major-11 in the new tree does not have any resemblance to major-11 in the old tree. That is not incorrect per-se, but that means that we cannot apply this patchset in reverse.

One possible way to fix this is to include another patch in the same patchset that shows the deletion of major-11. Rule 2. would be further revised to something like:

Rule 2 (revised again).
A patch renaming file A to file B requires that file A exists in the target tree with contents that match the preimage. It also requires that file B does not exist in the target tree, unless another patch in the patchset renames file B to some other file (possibly but not necessarily file A) or removes file B.

Again, that "other patch" in the patchset either renames or removes file B, so that requires that the target tree to have file B with contents that match the preimage of that patch.

More generally, the revised Rule 2. can be split into two parts; the former becomes an extension to Rule 4, and the latter becomes an extension to Rule 3.




  • A patch that causes a file A to disappear (i.e. removing file A, or renaming file A to file B) requires that the target tree to have file A, with contents that match the preimage of the patch.
  • A patch that causes a file B to appear (i.e. creating file B, or renaming/copying file A to file B) requires the target tree to lack file B, unless another patch in the patchset makes file B disappear (i.e. removing file B or renaming file B to something else).
In any case, a fixed patchset would look like this:

    diff --git a/major-08.txt b/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 5de90cb..a101d5f 100644
    --- a/major-08.txt
    +++ b/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.
    diff --git a/major-11.txt b/major-11.txt
    deleted file mode 100644
    index 517d9f8..0000000
    --- a/major-11.txt
    +++ /dev/null
    @@ -1,31 +0,0 @@
    -11. Justice.
    -
    -That the Tarot, though it is of all reasonable antiquity, is not of
    -...
    -via prudentiæ.

And these patches, under the re-revised rules, would apply cleanly to the old tree.

What about the reverse application? It would be a patchset that creates major-11 from nothingness (which is the reversal of a "deletion" patch), and creates major-08 by renaming major-11 and editing. Is the Rule 2. re-revised above sufficient?

The new tree (which is the target of the reverse application) only has major-11 and not major-08, so this rename should go through. The reverse of the deletion of major-11 is a creation of it with the contents fully given as the preimage of the (original) patch before reversing it, so that should also be OK with Rule 3 that is revised in a similar way with that "unless" thing. That is, creating major-11 requires that the old tree does not have major-11, but if another patch in the same patchset renames major-11 away or deletes it, then it is OK for a patch to create major-11. And the reversal of the first patch does rename major-11 to major-08, so all is well.

One disturbing thing about the above plan is that we have this comment at the end of diffcore-rename.c:

         * We would output this delete record if:
         *
         * (1) this is a broken delete and the counterpart
         *     broken create remains in the output; or
         * (2) this is not a broken delete, and rename_dst
         *     does not have a rename/copy to move p->one->path
         *     out of existence.
         *
         * Otherwise, the counterpart broken create
         * has been turned into a rename-edit; or
         * delete did not have a matching create to
         * begin with.

That is, we have an explicit logic to omit the missing "delete major-11" patch from the patchset. This comes from the very first commit that introduced "diff -B" (f345b0a0 (Add -B flag to diff-* brothers., 2005-05-30); it is plausible that the above comment came from lack of thinking in the original and not something we did to fix some bugs (if it were the latter, by showing the deletion in the case under discussion to "fix" the patchset in this example would end up breaking the original "fix").

So I would think that the right way to fix this is to stop filtering out the deletion half of the broken pair, even when the other creation-half of the pair no longer is in the output.