Friday, March 27, 2015

Stats from recent Git releases

Following up to the previous post, I computed a few numbers for each development cycle in the recent past.

In all the graphs in this article, the horizontal axis counts the number of days into the development cycle, and the vertical axis shows the number of non-merge commits made.

  • The bottom line in each graph shows the number of non-merge commits that went to the contemporary maintenance track.
  • The middle line shows the number of non-merge commits that went to the release but not to the maintenance track (i.e. shiny new toys, oops-fixes to them, and clean-ups that were too minor to be worth merging to the maintenance track), and
  • The top line shows the total number of non-merge commits in the release.

Even though I somehow have a fond memory of v1.5.3, the beginning of the modern Git was unarguably the v1.6.0 release. Its development cycle started in June 2008 and ended in August 2008. We can see that we were constantly adding a lot more new shiny toys (this cycle had the big "no more git-foo in user's $PATH" change) than we were applying fixes to the maintenance track during this period.

During the development cycle that led to v1.8.0 (August 2012 to October 2012), the pattern is very different. We cook our topics longer in the 'next' branch and we can clearly see that the topics graduate to 'master' in batches, which appear as jumps in the graph.

The cycle led to v2.0.0 (February 2014 to June 2014) has a similar pattern, but as another "we now break backward compatibility for ancient UI wart" release, we can see that a large batch of changes were merged in early part of the cycle, hoping to give them better and longer exposure to the testing public; on the other hand, we did not do too many fixes to the maintenance track.

The numbers for the current cycle leading to v2.4 (February 2015 to April 2015) are not finalized yet, but we can clearly see that this cycle is more about fixing old bugs than introducing shiny new toys from this graph.

Note that we should not be alarmed by the sharp rise at the end of the graph. We just entered the pre-release freeze period and the jump shows the final batch of topics graduating to the 'master' branch. We will have a few more weeks until the final, and during that period the graph will hopefully stay reasonably flat (any rise from this point on would mean we would be doing a last-minute "oops" fixes).

Thursday, March 26, 2015

Git 2.4 will hopefully be a "product quality" release

Earlier in the day, an early preview release for the next release of Git, 2.4-rc0, was tagged. Unlike many major releases in the past, this development cycle turned out to be relatively calm, fixing many usability warts and bugs, while introducing only a few new shiny toys.

In fact, the ratio of changes that are fixes and clean-ups in this release is unusually higher compared to recent releases. We keep a series of patches around each topic, whether it is a bugfix, a clean-up, or a new shiny toy, on its own topic branch, and each branch is merged to the 'master' branch after reviewing and testing, and then fixes and trivial clean-ups are also merged to the 'maint' branch. Because of this project structure, it is relatively easy to sift fixes and enhancement apart. Among new commits in release X since release (X-1), the ones that appear also in the last maintenance track for release (X-1) are fixes and clean-ups, while the remainder is enhancements.

Among the changes that went into v1.9.0 since v1.8.5, 23% of them were fixes that got merged to v1.8.5.6, for example, and this number has been more or less stable throughout the last year. Among the changes in v2.3.0 since v2.2.0, 18% of them were also in v2.2.2. Today's preview v2.4.0-rc0, however, has 333 changes since v2.3.0, among which 110 are in v2.3.4, which means that 33% of the changes are fixes and clean-ups.

These fixes came from 33 contributors in total, but changes from only a few usual suspects dominate and most other contributors have only one or two changes on the maintenance track. It is illuminating to compare the output between

$ git shortlog --no-merges -n -s ^maint v2.3.0..master
$ git shortlog --no-merges -n -s v2.3.0..maint

to see who prefers to work on new shiny toys and who works on product quality by fixing other people's bugs. The first command sorts the contributors by the number of commits since v2.3.0 that are only in the 'master', i.e. new shiny toys, and the second command sorts the contributors by the number of commits since v2.3.0 that are in the 'maint', i.e. fixes and clean-ups.

The output matches my perception (as the project maintainer, I at least look at, if not read carefully, all the changes) of each contributor's strength and weakness fairly well. Some are always looking for new and exciting things while being bad at tying loose ends, while others are more careful perfectionists.

Wednesday, March 25, 2015

Git Rev News

Christian Couder (who is known for his work enhancing the "git bisect" command several years ago) and Thomas Ferris Nicolaisen (who hosts a popular podcast GitMinutes) started producing a newsletter for Git development community and named it Git Rev News.

Here is what the newsletter is about in their words:

Our goal is to aggregate and communicate some of the activities on the Git mailing list in a format that the wider tech community can follow and understand. In addition, we'll link to some of the interesting Git-related articles, tools and projects we come across.

This edition covers what happened during the month of March 2015.

As one of the people who still remembers "Git Traffic", which was meant to be an ongoing summary of the Git mailing list traffic but disappeared after publishing its first and only issue, I find this a very welcome development. Because our mailing list is a fairly high-volume one, it is almost impossible to keep up with everything that happens there, unless you are actively involved in the development process.

I hope their effort will continue and benefit the wider Git ecosystem. You can help them out in various ways if you are interested.

  • They are not entirely happy with how the newsletter is formatted. If you are handy with HTML, CSS or some blog publishing platforms, they would appreciate help in this area.
  • They are not paid full-time editors but doing this as volunteers. They would appreciate editorial help as well.
  • You can contribute by writing your own articles that summarize the discussions you found interesting on the mailing list.

Friday, February 27, 2015

Nexus 4 still live and kicking

My everyday phone for the past few years has been Nexus 4. I also have a Nexus 5, which is slightly larger and with a much better screen, but I never felt a need to switch (I did try to have "Let's use N5 this week" every once in a while, though). Last year's Nexus 6 simply felt too large for me. Besides, it is too expensive for me to buy.

I noticed that my N4 recently stopped picking up NFC and charging via wireless. Later I learned that this is a typical sign that its battery needs replacement. Not because the battery got too weak to hold charge, but because the battery started bloating, pushing against the back cover, which necessary antennas are built onto. By slightly raising the back cover by bulging out, the bloated battery breaks the connection from the motherboard to these antennas, which is made only by contact. And that is how NFC and wireless charging are broken.

At least, that is the story I read.

After learning how to, and getting a replacement battery and a few small screw/torx drivers, I opened the phone (which took me some time) and saw this bloated battery.

No wonder the back cover looked warped. After placing the new battery and closing the back, NFC started picking up very reliably and it charges properly on a wireless charger.

Happy ;-)

Sunday, February 8, 2015

Fun with "git diff -B -M"

Git lets you view a change that renames an original file A to a new location B while doing some minor edits to its contents as a "rename" patch (i.e. "rename A to B, with the following content differences"). You can even view a change that renames two original files, A and B, by swapping their contents and optionally doing some minor edits to them as a patchset that contains two "rename" patches ("rename A to B" and "rename B to A"). These were invented by me early in Git's life, when Linus was still running the project, back in mid 2005. More recently, other tools (including GNU patch) started understanding patches that use these features.

I however recently noticed a few corner cases that git diff and friends produce a wrong patchset, or git apply fails to apply correctly constructed patches, and I have been thinking about the right fixes to these issues. This article will illustrate these tricky cases and describe my current thinking.

In this write-up, I'll use these terms:

Output from a single git diff invocation, which may contain one or more patch.

A part of a patchset from a header line that begins with "diff --git" up to (but excluding) the next such header line.

git diff compares two collections of files; each collection is a tree. Left and right sides of the comparison are called old tree and new tree, respectively. The tree to which we attempt to apply a patchset is the target tree. The tree we would get after a successful application of a patchset is the resulting tree. A tree does not have to be a tree object—we may be comparing the index and the files in the working tree, for example.

The preimage consists of lines in a patch that are prefixed by "-" (minus) or " " (space) but not "+" (plus) that denote what the patched file ought to have for the patch to apply. The postimage consists of lines that are prefixed by "+" (plus) or " " (space) that denote what the patched result ought to look like.

1. Basics

First the basics. Let's think about a patchset with a single patch. What does this patchset tell us?

    diff --git a/major-08.txt b/major-08.txt
    index 680c5f6..5de90cb 100644
    --- a/major-08.txt
    +++ b/major-08.txt
    @@ -1,3 +1,3 @@
    -8. Fortitude.
    +8. Strength.

     This is one of the cardinal virtues, of which I shall speak later.

It obviously tells us that the new tree changed "Fortitude", that used to be in the old tree, to "Strength", but it actually tells us a bit more about the old tree. For this patchset to apply, the target tree must have a file "major-08.txt" that begins with lines we see as the preimage in the patch.

2. Renaming a file

Now let's get a bit fancier and study a patchset with a rename patch. What does this patchset tell us?

    diff --git rws/major-08.txt marseille/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 680c5f6..2ab22a0 100644
    --- rws/major-08.txt
    +++ marseille/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Fortitude.
    +11. Fortitude.

     This is one of the cardinal virtues, of which I shall speak later.

We can see that this is going from the same old tree as the previous one's old tree, renames major-08 to major-11 with slight modification.

It tells us more about the trees, compared to the previous example. For this patchset to apply, the target tree must satisfy the same pre-conditon as the previous one about major-08, and in addition it must lack major-11; otherwise we wouldn't be renaming a new file to it.

So far, things are straight-forward.

In summary:

Rule 1.
A patch from file A to file A requires that file A exists in the target tree with contents that match the preimage of the patch.

Rule 2.
A patch renaming file A to file B requires that file A exists the target tree with contents that match the preimage of the patch. It also requires that file B must not exist in the target tree.

Rule 3.
A patch that creates file A requires that file A does not exist in the target tree.

Rule 4.
A patch that deletes file A requires that file A exists in the target tree with contents that match the preimage of the patch.

The latter two I didn't illustrate with examples, but they should be obvious. Also, we can think of Rule 2 (rename) as a natural extension of Rule 3 (creation) and part of Rule 4 (deletion). When you rename file A to file B, optionally with some content changes, you are:

  • creating file B, so the target tree must not have file B already.
  • deleting file A, so the target tree must have file A with the content that matches (part of) it.
Similarly, a patch that creates file B by copying file A, optionally with some content changes, you are creating file B, so the target tree must not have file B already. Also, the target tree must have file A with the contents that match the preimage of the patch.

3. First twist: cross renaming

Now, here is the first twist. What does this patchset mean?

    diff --git rws/major-11.txt marseille/major-08.txt
    similarity index 99%
    rename from major-11.txt
    rename to major-08.txt
    index 517d9f8..44e8d3a 100644
    --- rws/major-11.txt
    +++ marseille/major-08.txt
    @@ -1,3 +1,3 @@
    -11. Justice.
    +8. La Justice

     That the Tarot, though it is of all reasonable antiquity, is not of
    diff --git rws/major-08.txt marseille/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 5de90cb..a101d5f 100644
    --- rws/major-08.txt
    +++ marseille/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.

This is a "swap" patchset, that swaps major-08 and major-11 with small edit. You would have done something like this to prepare such a change, starting from an old tree with two files with substantially different contents, both of which are of meaningful sizes:

    $ mv major-11.txt tmp
    $ mv major-08.txt major-11.txt
    $ mv tmp major-11.txt
    $ edit major-08.txt major-11.txt ;# just a bit
    $ git commit -m swap major-08.txt major-11.txt
    $ git diff -B -M HEAD^

A patch renaming major-11 to major-08 (i.e. the first one in this two-patch patchset) still requires that major-11 must exist in the target tree for the patchset to apply, which is the first half of Rule 2.

But the other half of Rule 2 is not satisfied. The target of the rename, major-08, has to exist in the target tree; otherwise we cannot rename it to major-11 in the second patch in the patchset. The rule needs a bit of revising, perhaps like this:

Rule 2.
A patch renaming file A to file B requires that file A exists with contents that match its preimage. And file B must not exist in the target tree, unless another patch in the patchset renames file B to some other file (possibly but not necessarily file A).

Of course, for such an "other patch" to be able to rename file B to somewhere else, the target tree is required to have file B.

It is important to have that "unless" part in the revised Rule 2. We need to make sure that we do not allow the sample patchset in "2. Renaming a file" to overwrite an existing file major-11 in the target tree blindly.

4. Second twist: rewriting and copying

The previous one showed how git diff -B -M can be used to detect cross renaming files and apply the resulting patchset (you can circularly rename more than two, i.e. A -> tmp, B -> A, ..., Z -> Y, tmp -> Z). It can also detect when you did this:

    $ cp major-08.txt major-11.txt
    $ edit major-08.txt ;# extensively
    $ git add major-08.txt major-11.txt
    $ git commit -m 'create 11 out of 08, rewrite 08'
    $ git diff -B -M HEAD^

And you would see:

    diff --git a/major-08.txt b/major-08.txt
    dissimilarity index 99%
    index 5de90cb..44e8d3a 100644
    --- a/major-08.txt
    +++ b/major-08.txt
    @@ -1,10 +1,31 @@
    -8. Strength.
    -This is one of the cardinal virtues, of which I shall speak later.
    -the principle of all force.
    +8. La Justice
    +That the Tarot, though it is of all reasonable antiquity, is not of
    +via prudentiæ.
    diff --git a/major-08.txt b/major-11.txt
    similarity index 97%
    copy from major-08.txt
    copy to major-11.txt
    index 5de90cb..a101d5f 100644
    --- a/major-08.txt
    +++ b/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.

The first patch in the patchset is "a patch from file A to file A", even though it is an extensive rewrite. The target tree is required to have major-08 whose contents match the preimage of the patch (Rule 1). The second patch copies from major-08 to create a new file major-11. The target tree is required to lack major-11 (Rule 3; copying into A is creation of A). It also must have major-08 that begins with the preimage of the patch.

Another thing to note is that an application of a patchset in Git is not incremental. Even though the first patch in the patchset talks about extensively rewriting major-08, and the second patch talks about creating major-11 by copying major-08 and then making a minor edit to it, the latter patch is the difference between the major-08 in the old tree and the major-11 in the new tree. It is not the difference between these two files in the new tree, i.e. it is not "modify major-08 and then copy the result to major-11 and then edit". If you think about it, this is also consistent with the previous "cross renaming" section. The first patch in the patchset renames major-11 to major-08, and the second patch that renames major-08 to major-11 is not about remaing the file that originally was major-11 that the first patch renamed back to its original position. The two patches are not applied incrementally (or sequentially).

So far, all the examples shown above will work correctly with today's Git (some reimplementations of Git may lack support, but at least the one I maintain does work correctly). When you use the old tree as the target tree, git apply accepts the patchset and recreates the new tree correctly.

But if you use the new tree of this example as the target tree and try to use git apply -R to apply the patchset in reverse, it does not work correctly. It is a bug.

Currently git apply -R does a nonsense for a copying patch. To reverse any patch, it just swaps the preimage and the postimage, and then swaps the names of the files in the old tree and in the new tree.

But the reverse of "create major-11 by copying major-08 into it and then change Strength to La Force" (which is the second patch in the patchset in this section) is not "create major-08 by copying major-11 into it and then change La Force to Strength", which you would get by simply swapping the preimage and the postimage and swapping the names of the files in the second patch.

What should we do to "reverse" a patchset that has copies?

Reverse of "create major-11 by copying major-08" should at least be "remove major-11", and preferably accompanied by "while making sure that major-11 matches the postimage of the patch".

The "preferably" part is a moderately strong preference. When the copying was done without any modification, we would not have any preimage or postimage to enable us to check that the target tree of the reverse application is similar enough to the new tree the patchset was taken from. Instead, we would end up just checking "major-11 exists" and then removing it happily, even if the contents of the file major-11 is vastly different from that of the new tree the patchset was taken from, which feels somewhat unsafe.

Admittedly, the same "it feels unsafe" factor exists when applying a bog-standard pure rename patch (imagine that the example in "2. Renaming a file" was done without editing the first line and kept the original "8. Fortutide." without renumbering it. We would not have any preimage we can use to make sure we are patching the correct file).

But as long as we have patch text that we can use for sanity checking, we should use it, I would think.

5. Third twist: rewriting by copying

If you started from two vastly different files, both of which have contents of meaningful size, and did this:

    $ cp major-08.txt major-11.txt
    $ edit major-11.txt
    $ rm major-08.txt
    $ git commit -m 'rewrite 11 by copying 08' major-08.txt major-11.txt
    $ git diff -B -M HEAD^

You would see this patchset:

    diff --git a/major-08.txt b/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 5de90cb..a101d5f 100644
    --- a/major-08.txt
    +++ b/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.

This is another bug. I sent out a warning to both the Git and the Linux kernel mailing list, not to use the "-B -M" options together for this reason.

The revised Rule 2. from "3. First twist" tells us that major-08 must exist in the target tree, which is OK, but also major-11, the target of the rename, must not exist. This makes the resulting patchset unapplicable to the old tree the patchset was taken from, which simply does not make sense.

If you take a diff between states X and Y, you should be able to apply that diff to the state X and the resulting state should be identical to the state Y, and you should be able to apply that diff in reverse to state Y to go back to the state X.

Worse, the reverse of this patchset would apply to the new tree without an error, but does not reproduce the old tree correctly, which is a more serious bug. It instead applies the patch in reverse and recreates the original major-08, but the other file, major-11, is lost.

The patchset does not have enough information for us to recreate its original contents of major-11 we had in the old tree. The patchset says that the contents of major-11 in the new tree came from the contents of major-08 in the old tree, and the major-11 in the new tree does not have any resemblance to major-11 in the old tree. That is not incorrect per-se, but that means that we cannot apply this patchset in reverse.

One possible way to fix this is to include another patch in the same patchset that shows the deletion of major-11. Rule 2. would be further revised to something like:

Rule 2 (revised again).
A patch renaming file A to file B requires that file A exists in the target tree with contents that match the preimage. It also requires that file B does not exist in the target tree, unless another patch in the patchset renames file B to some other file (possibly but not necessarily file A) or removes file B.

Again, that "other patch" in the patchset either renames or removes file B, so that requires that the target tree to have file B with contents that match the preimage of that patch.

More generally, the revised Rule 2. can be split into two parts; the former becomes an extension to Rule 4, and the latter becomes an extension to Rule 3.

  • A patch that causes a file A to disappear (i.e. removing file A, or renaming file A to file B) requires that the target tree to have file A, with contents that match the preimage of the patch.
  • A patch that causes a file B to appear (i.e. creating file B, or renaming/copying file A to file B) requires the target tree to lack file B, unless another patch in the patchset makes file B disappear (i.e. removing file B or renaming file B to something else).
In any case, a fixed patchset would look like this:

    diff --git a/major-08.txt b/major-11.txt
    similarity index 97%
    rename from major-08.txt
    rename to major-11.txt
    index 5de90cb..a101d5f 100644
    --- a/major-08.txt
    +++ b/major-11.txt
    @@ -1,3 +1,3 @@
    -8. Strength.
    +11. La Force

     This is one of the cardinal virtues, of which I shall speak later.
    diff --git a/major-11.txt b/major-11.txt
    deleted file mode 100644
    index 517d9f8..0000000
    --- a/major-11.txt
    +++ /dev/null
    @@ -1,31 +0,0 @@
    -11. Justice.
    -That the Tarot, though it is of all reasonable antiquity, is not of
    -via prudentiæ.

And these patches, under the re-revised rules, would apply cleanly to the old tree.

What about the reverse application? It would be a patchset that creates major-11 from nothingness (which is the reversal of a "deletion" patch), and creates major-08 by renaming major-11 and editing. Is the Rule 2. re-revised above sufficient?

The new tree (which is the target of the reverse application) only has major-11 and not major-08, so this rename should go through. The reverse of the deletion of major-11 is a creation of it with the contents fully given as the preimage of the (original) patch before reversing it, so that should also be OK with Rule 3 that is revised in a similar way with that "unless" thing. That is, creating major-11 requires that the old tree does not have major-11, but if another patch in the same patchset renames major-11 away or deletes it, then it is OK for a patch to create major-11. And the reversal of the first patch does rename major-11 to major-08, so all is well.

One disturbing thing about the above plan is that we have this comment at the end of diffcore-rename.c:

         * We would output this delete record if:
         * (1) this is a broken delete and the counterpart
         *     broken create remains in the output; or
         * (2) this is not a broken delete, and rename_dst
         *     does not have a rename/copy to move p->one->path
         *     out of existence.
         * Otherwise, the counterpart broken create
         * has been turned into a rename-edit; or
         * delete did not have a matching create to
         * begin with.

That is, we have an explicit logic to omit the missing "delete major-11" patch from the patchset. This comes from the very first commit that introduced "diff -B" (f345b0a0 (Add -B flag to diff-* brothers., 2005-05-30); it is plausible that the above comment came from lack of thinking in the original and not something we did to fix some bugs (if it were the latter, by showing the deletion in the case under discussion to "fix" the patchset in this example would end up breaking the original "fix").

So I would think that the right way to fix this is to stop filtering out the deletion half of the broken pair, even when the other creation-half of the pair no longer is in the output.

Thursday, February 5, 2015

Git 2.3

The latest feature release of Git version control system, version 2.3, is now available at the usual places.

This one ended up to be a release with lots of small corrections and improvements without big uncomfortably exciting features. It is a lot smaller release than other recent feature releases, consisting of 255 non-merge commits (version 2.0, 2.1 and 2.2 had 475, 698 and 556 commits, respectively) by 61 contributors (among which 19 are new people—welcome!).

The recent security fix that went to 2.2.1 and older maintenance tracks is also contained in this update.

One of my favorite small changes in this release is that the "Conflicts:" section that is prepared in the buffer to write your commit log message during a merge is now commented out, just like all the other hints to help you prepare the log message (e.g. the list of files with changes you might want to mention in the log, and the list of untracked files you might have forgot to "git add"). For the full text of the release notes, please visit the list archive.


Friday, January 2, 2015

Having fun with Crouton

Chromebooks run ChromeOS, which is based on Linux but is made to appear running only the browser. Even though we can do so many things with just the browser these days, I stil have a few reasons why I need to keep a notebook that is not a Chromebook around me: Gimp (very occassionally when I take photos and need to touch them up), Calibre (to manage and populate a Nook Glowlight with eBooks) and GnuCash (to balance my checkbook).

Since I replaced it with Toshiba Chromebook 2, my old Samsung ARM Chromebook was looking for a good alternative use, and I thought I may be able to use it to run GnuCash under Crouton. Crouton is a tool to let us run more traditional Linux distros in a chroot environment on ChromeOS devices. I learned that it recently got better by allowing its virtual desktops shown in separate windows, side by side with native Chrome browser windows. One downside of Crouton is that it can only run under developer mode, side-stepping the ChromeOS's security model.

Even though I cannot turn my primary Chromebook to developer mode (because it has to be enrolled for enterprise access to access the workstations at work), I can sacrifice the ARM Chromebook that has now become redundant.

So, following instructions from the primary site of crouton, here is what I did:
  • Turn Chromebook into developer mode (this wipes the device)
    • Turn off the machine
    • Hold ESC + Refresh and turn the machine on to go into Recovery
    • Ctrl-d to reboot into the developer mode
  • The usual Chromebook activation
  • Download crouton by visiting
  • Install crouton extension by visiting the webstore
  • Type Ctrl-Alt-t to open a terminal-looking window, type shell and then type
    cd ~/Downloads; sudo bash to get a useful interactive shell running as root
  • Type sh ./crouton -r trusty -t xfce and let it run (takes some time)
  • Type sh ./crouton -r trusty -u -t extension and let it run (takes some more time)
  • Type sh ./crouton -r trusty -u -t xiwi and let it run (takes some more time)
  • Then type startxfce4 which will open a XFCE desktop environment, Ubuntu trusty distribution.
  • Open a terminal in that Ubuntu environment, install gnucash as I normally would (e.g.
    apt-get update
    apt-get install gnucash

    just like any Debian-derived distribution).
A few tips I had to figure out by trial and error that I didn't find on the Web (I am not saying these tips do not exist elsewhere; I am saying that I didn't find them ;-) are:
  • Even though crouton -t xfce,extension,xiwi is supposed to be the syntax to install multiple targets, I couldn't get it work well. Adding xiwi as an update (notice the -u option in the above) after everything else seemed to be a way to make it work.
  • After reading about crouton but before trying it out myself, I wondered how to make the two environments talk with each other (especially how to transfer "gnucash" data file across as running it is the primary reason why I am interested in this whole exercise), but it turns out that it was surprisingly easy and straightforward. In the Ubuntu environment that runs under crouton, ~/Downloads is the same Downloads local file shown in the Files application on the ChromeOS side.
  • Every time I turn the Chromebook in developer mode on, it goes into Recovery and needs Ctrl-d to continue booting. The Recovery screen looks scary but this seems normal.
  • Running the crouton environment is done by
    • Type Ctrl-Alt-t for a terminal-looking window
    • Type shell and then
    • Type sudo startxfce4 -b
  • Even though Samsung ARM Chromebook is not a speed daemon and has merely 2GB, it is more than adequate to fill my needs. I've seen people say xiwi (which lets the X session to be seen in its own window, instead of occupying the full screen and has to be switched with Ctrl-Alt-Back/Forth keys) is too slow to be usable, but I am not running graphical games. I have a suspicion that I will be cursing it when I start using Gimp, but until then ... ;-)
(Left side runs Crouton in its own window, right side is just a normal Chrome browser)

Monday, December 22, 2014

On CVE-2014-9390 and Git 2.2.1

Now the security-fix releases are behind us, let's briefly talk about the ramifications.

The recent Git/Hg vulnerability on case-insensitive or normalizing filesystems are serious for people who fetch and integrate (either pull or pull --rebase) from untrusted sources.

When you grab a tree that records a malicious path, say, ".Git/hooks/post-checkout" using an older version of Git on such a filesystem (e.g. Windows NTFS or Mac OS X HFS+), Git will tell the filesystem to check it out at ".Git/hooks/post-checkout", but the filesystem overwrites a file different from what Git asked it to write, namely ".git/hooks/post-checkout", which is a path reserved for you to store an executable hook that is run after running "git checkout".

For an attacker to victimize you through this vector, the attacker has to have a write access to a repository you pull from. As long as you do not interact with untrustworthy strangers (e.g. only pull from the projects' official history), you will not be affected. That is often true in corporate setting, where the access to the central repository everybody in the product group uses is tightly controlled, and if an untrustworthy stranger has a write access there, you already have a bigger problem.

But the open-source is all about collaboration, and we need to meet and interact with new people every day while doing so. The prudent thing to do is to (1) update to the version of Git recently released to work around this issue, and then (2) respond to a pull request from a stranger, in this order. Don't do it the other way around!


Thursday, December 18, 2014

Git, 1.9.5, 2.0.5, 2.1.4 and 2.2.1 and thanking friends in Mercurial land

We have a set of urgent maintenance releases. Please update your Git if you are on Windows or Mac OS X.

Git maintains various meta-information for its repository in files in .git/ directory located at the root of the working tree. The system does not allow a file in that directory (e.g. .git/config) to be committed in the history of the project, or checked out to the working tree from the project. Otherwise, an unsuspecting user can run git pull from an innocuous-looking-but-malicious repository and have the meta-information in her repository overwritten, or executable hooks installed by the owner of that repository she pulled from (i.e. an attacker).

Unfortunately, this protection has been found to be inadequate on certain file systems:
  • You can commit and checkout to .Git/<anything> (or any permutations of cases .[gG][iI][tT], except .git all in lowercase). But this will overwrite the corresponding .git/<anything> on case-insensitive file systems (e.g. Windows and Mac OS X).
  • In addition, because HFS+ file system (Mac OS X) considers certain Unicode codepoints as ignorable; committing e.g. .g\u200cit/config, where U+200C is such an ignorable codepoint, and checking it out on HFS+ would overwrite .git/config because of this.
The issue is shared with other version control systems and has serious impact on affected systems (CVE-2014-9390).

Credit for discovering this issue goes to our friends in the Mercurial land (most notably, the inventor of Hg, Matt Mackall himself). The fixes to this issue for various implementations of Git (including mine, libgit2, JGit), ports using these implementations (including Git for Windows, Visual Studio) and also Mercurial have been coordinated for simultaneous releases. GitHub is running an updated version of their software that rejects trees with these confusing and problematic paths, in order to protect its users who use existing versions of Git (also see their blog post).

A huge thanks to all those who were involved.

New releases of Git for Windows, Git OSx Installer, JGit and libgit2 have been prepared to fix this issue. Microsoft (which uses libgit2 in their Visual Studio products) and Apple (which distributes a port of Git in their Xcode) both have fixes, as well.

For people building from the source, fixed versions of Git have been released as versions v1.8.5.6, v1.9.5, v2.0.5, v2.1.4, and v2.2.1 for various maintenance tracks.


Tuesday, September 30, 2014

Fun (?) with GnuPG

We use GnuPG as part of the infrastructure to certify authenticity of development history in Git in various places:
  • Signed tags created by git tag -s is to say "This tag was created by me, the holder of the private GnuPG key that signed this object". Because the object name of any Git object is computed as a cryptographic hash over what the object records, and because a signed tag object records the object name of a tagged object (typically a commit) and the human readable name (typically a release number or name) the tagger wants to give the tagged object, an attacker cannot forge a phony tag that points at a different commit signed with the private key the attacker does not have. You are saying "You can verify that it is true that I wanted to make that commit release X" safely because of this. Also, because the commit object records all the objects and their location in a project tree, and the parent commit objects, such a signed tag also ensures that all the development history behind such a tagged commit cannot be tampered with.
  • When you merge a signed tag (either done by git merge or git pull), the content of the tag with its GnuPG signature is copied to the resulting commit object. This lets you ensure that the history behind the side branch that was merged to the history cannot be tampered with and the signature certifies that it came from the signer (typically a subsystem lieutenant).
  • Signed commits created by git commit -S is a way to say "This commit was created by me", and ensures that the history behind the commit cannot be tampered with and certifies that the change it introduces came from the signer.
  • Still under development is git push --signed, a way to certify that you wanted to put a particular commit at the tip of a particular branch.
GnuPG is also used as a mechanism to ensure the integrity and authenticity of tarballs that are sent to the servers, which is a common distribution point for open source projects like the Linux kernel and Git itself. A maintainer prepares a tarball and a detached signature, uploads them, and the receiving end will verify that the signature is good.

It is a common practice to specify the expiration date when creating a signing key. For example, the key I have been using to sign Git release tags was originally set up to expire in 3 years since the key was created. But the thing is, a project may outlive that expiry date. An interesting question is what happens to the existing tags when the key expires.

Unluckily, the right thing happens. If the holder of the key does not do anything, the key becomes expired, and the signatures in the signed tags stops validating. Luckily, the validity of a key can be extended by the holder of the key, and once it is done, the signatures made before the key's original expiration date will continue to validate fine.

At least, that is the theory ;-)

As my key was originally set to expire early next month, I've extended the lifespan of the key 96AFE6CB I have been using a few days ago and uploaded the updated key to pgp keyservers, so existing signed tags (e.g. v2.0.0) should continue to be valid.

A few tips:
  • Although this page is a specific instruction to Debian contributors, it was very helpful when I had to figure out how to futz with GnuPG subkeys. It does not talk about how to update the expiration date for a subkey, though (you use "gpg --edit-key" and then use "expire" command).
  • In order to force a specific subkey to be used when signing for Git, you would need to use the ! suffix to the GnuPG key-id, e.g. in my ~/.gitconfig file:
      [user] signingkey = 96AFE6CB!
    Without the ! suffix, GnuPG tries to use the newest subkey you have associated with the same primary key, which may not be the subkey you would want to use.
I signed a new v2.1.2 maintenance release with the same key today. Hopefully it will validate OK for you (otherwise, you may have to fetch the public key from the keyserver).