Unix as IDE: Revisions

This entry is part 7 of 7 in the series Unix as IDE.

Version control is now seen as an indispensable part of professional software development, and GUI IDEs like Eclipse and Visual Studio have embraced it and included support for industry standard version control systems in their products. Modern version control systems trace their lineage back to Unix concepts from programs such as diff and patch however, and there are plenty of people who will insist that the best way to use a version control system is still at a shell prompt.

In this last article in the Unix as an IDE series, I’ll follow the evolution of common open-source version control systems from the basic concepts of diff and patch, among the very first version control tools.

diff, patch, and RCS

A central concept for version control systems has been that of the unified diff, a file expressing in human and computer readable terms a set of changes made to a file or files. The diff command was first released by Douglas McIlroy in 1974 for the 5th Edition of Unix, so it’s one of the oldest commands still in regular use on modern systems.

A unified diff, the most common and interoperable format, can be generated by comparing two versions of a file with the following syntax:

$ diff -u example.{1,2}.c
--- example.1.c    2012-02-15 20:15:37.000000000 +1300
+++ example.2.c    2012-02-15 20:15:57.000000000 +1300
@@ -1,8 +1,9 @@
 #include <stdio.h>
+#include <stdlib.h> 

 int main (int argc, char* argv[]) { printf("Hello, world!\n");
-    return 0;
+    return EXIT_SUCCESS; }

In this example, the second file has a header file added, and the call to return changed to use the standard EXIT_SUCCESS rather than a literal 0 as the return value for main(). Note that the output for diff also includes metadata such as the filename that was changed and the last modification time of each of the files.

A primitive form of version control for larger code bases was thus for developers to trade diff output, called patches in this context, so that they could be applied to one another’s code bases with the patch tool. We could save the output from diff above as a patch like so:

$ diff -u example.{1,2}.c > example.patch

We could then send this patch to a developer who still had the old version of the file, and they could automatically apply it with:

$ patch example.1.c < example.patch

A patch can include diff output from more than one file, including within subdirectories, so this provides a very workable way to apply changes to a source tree.

The operations involved in using diff output to track changes were sufficiently regular that for keeping in-place history of a file, the Source Code Control System and the Revision Control System that has pretty much replaced it were developed. RCS enabled “locking” files so that they could not be edited by anyone else while “checked out” of the system, paving the way for other concepts in more developed version control systems.

RCS retains the advantage of being very simple to use. To place an existing file under version control, one need only type ci <filename> and provide an appropriate description for the file:

$ ci example.c
example.c,v  <--  example.c
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> example file
>> .
initial revision: 1.1
done

This creates a file in the same directory, example.c,v, that will track the changes. To make changes to the file, you check it out, make the changes, then check it back in:

$ co -l example.c
example.c,v  -->  example.c
revision 1.1 (locked)
done
$ vim example.c
$ ci -u example.c
example.c,v  <--  example.c
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> added a line
>> .
done

You can then view the history of a project with rlog:

$ rlog example.c

RCS file: example.c,v
Working file: example.c
head: 1.2
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2; selected revisions: 2
description:
example file
----------------------------
revision 1.2
date: 2012/02/15 07:39:16;  author: tom;  state: Exp;  lines: +1 -0
added a line
----------------------------
revision 1.1
date: 2012/02/15 07:36:23;  author: tom;  state: Exp;
Initial revision
=============================================================================

And get a patch in unified diff format between two revisions with rcsdiff -u:

$ rcsdiff -u -r1.1 -r1.2 ./example.c 
===================================================================
RCS file: ./example.c,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- ./example.c 2012/02/15 07:36:23 1.1
+++ ./example.c 2012/02/15 07:39:16 1.2
@@ -4,6 +4,7 @@
 int main (int argc, char* argv[])
 {
     printf("Hello, world!\n");
+    printf("Extra line!\n");
     return EXIT_SUCCESS;
 }

It would be misleading to imply that simple patches were now in disuse as a method of version control; they are still very commonly used in the forms above, and also figure prominently in both centralised and decentralised version control systems.

CVS and Subversion

To handle the problem of resolving changes made to a code base by multiple developers, centralized version systems were developed, with the Concurrent Versions System (CVS) developed first and the slightly more advanced Subversion later on. The central feature of these systems are using a central server that contains the repository, from which authoritative versions of the codebase at any particular time or revision can be retrieved. These are termed working copies of the code.

For these systems, the basic unit of the systems remained the changeset, and the most common way to represent these to the user was in the archetypal diff format used in earlier systems. Both systems work by keeping records of these changesets, rather than the actual files themselves from state to state.

Other concepts introduced by this generation of systems were of branching projects so that separate instances of the same project could be worked on concurrently, and then merged into the mainline, or trunk with appropriate testing and review. Similarly, the concept of tagging was introduced to flag certain revisions as representing the state of a codebase at the time of a release of the software. The concept of the merge was also introduced; reconciling conflicting changes made to a file manually.

Git and Mercurial

The next generation of version control systems are distributed or decentralized systems, in which working copies of the code themselves contain a complete history of the project, and are hence not reliant on a central server to contribute to the project. In the open source, Unix-friendly environment, the standout systems are Git and Mercurial, with their client programs git and hg.

For both of these systems, the concept of communicating changesets is done with the operations push, pull and merge; changes from one repository are accepted by another. This decentralized system allows for a very complex but tightly controlled ecosystem of development; Git was originally developed by Linus Torvalds to provide an open-source DVCS capable of managing development for the Linux kernel.

Both Git and Mercurial differ from CVS and Subversion in that the basic unit for their operations is not changesets, but complete files (blobs) saved using compression. This makes finding the log history of a single file or the differences between two revisions of a file slightly more expensive, but the output of git log --patch still retains the familiar unified diff output for each revision, some forty years after diff was first being used:

commit c1e5559ddb09f8d02b989596b0f4100ad1aab422
Author: Tom Ryder <tom@sanctum.geek.nz>
Date:   Thu Feb 2 01:14:21 2012

Changed my mind about this one.

diff --git a/vim/vimrc b/vim/vimrc index cfbe8e0..65a3143 100644
--- a/vim/vimrc
+++ b/vim/vimrc
@@ -47,10 +47,6 @@
 set shiftwidth=4
 set softtabstop=4
 set tabstop=4

-" Heresy
-inoremap <C-a> <Home>
-inoremap <C-e> <End>
-
 " History
 set history=1000

The two systems have considerable overlap in functionality and even in command set, and the question of which to use provokes considerable debate. The best introductions I’ve seen to each are Pro Git by Scott Chacon, and Hg Init by Joel Spolsky.

Conclusion

This is the last post in the Unix as IDE series; I’ve tried to offer a rapid survey of the basic tools available just within a shell on GNU/Linux for all of the basic functionality afforded by professional IDEs. At points I’ve had to be not quite as thorough as I’d like in explaining certain features, but to those unfamiliar to development on GNU/Linux machines this will all have hopefully given some idea of how comprehensive a development environment the humble shell can be, and all with free, highly mature, and standard software tools.

Subversion via Git

Subversion is a lot better than no version control system at all, but for those accustomed to distributed version control systems like Git or Mercurial, it can be pretty painful to lose features like cheap and flexible branching, intelligent merging and rebasing, and the snappy operations of Git when the need arises to work on code in a Subversion repository, perhaps for a legacy project or for an organisation with established repositories.

Fortunately, there’s an excellent compromise available in using the git-svn wrapper, which allows you to treat a Git repository as a working copy of a Subversion repository. This works transparently, meaning that others using the regular svn client on your team won’t have any difficulty working with your commits or branches, but your private Git workflow can meanwhile be as simple or complex as you like.

Installing

If you’re on Debian or Ubuntu, you can install the git-svn wrapper with:

# apt-get install git-svn

If you’re installing Git from source, it’s included in the default installation.

Cloning (Checkout)

To check out a copy of the trunk of the repository, you can clone it directly:

$ git svn clone svn://server.network/project/trunk project

This will provide you with a working copy of the repository’s trunk in the form of a Git repository, complete with a git log history imported from the Subversion commits. If you want to include tracking for the repository’s branches and tags as well, you can specify the paths to them on the command line:

$ git svn clone svn://server.network/project project \
    --trunk trunk --branches branches --tags tags

Typing git branch and git tag in the resulting repository will then show these as available branches and tags, as if they’d been created in Git. If the Subversion repository has a standard layout of folders, with branches, tags, and trunk, you can pass the --stdlayout or -s as a shortcut for the above:

$ git svn clone svn://server.network/project project --stdlayout

Fetching (Updating)

To pull the most recent changes from the Subversion repository into your Git “working copy”, use:

$ git svn fetch

If you have local changes in your repository that have not yet been committed, you may be prompted to temporarily cache them while you run the fetch operation. Git’s stash function works well for this:

$ git stash
$ git svn fetch
$ git stash apply

Committing

You can commit to your Git repository as normal with git commit. When you’re ready to send these commits to the Subversion repository, you can do so with:

$ git svn dcommit

This will also note information about the Subversion commit in the output of git log.

Branching

If you want to create a new branch in Git that tracks a similarly created branch in the Subversion repository, you can do this:

$ git svn branch experimental

It’s useful to note that if you don’t have any need to add the branch you’re creating to the Subversion repository, you can just use the usual git branch to keep a branch restricted to your Git repository. You should only need the git svn branch facility if others might need to use that branch before you merge it.

Merging

While it may be possible to conduct merges within the Subversion repository from the git-svn client, I think this particular task is probably best done using the usual svn merge tool. If you’re up to reading how this works in man git-svn, you should go for it, but I don’t think that getting a handle on the complexity of the rules for how these merges are run is really worth the effort in most cases, and given how brittle Subversion can be it’s likely not worth the risk of breaking things.

Properties

Aside from the above recommendation about using the native svn merge to conduct merges of Subversion branches, another limitation of the git-svn client is it ignores Subversion properties like the very useful svn:ignore, and furthermore doesn’t provide any way to set them. As a result, after the clone Git won’t know which file patterns a traditional Subversion working copy would be set up to ignore. You can emulate this by writing the properties file to a .gitignore or the .git/info/exclude files:

$ git svn show-ignore >> .gitignore
$ git svn show-ignore >> .git/info/exclude

Empty directories

Finally, Subversion and Git differ in how they treat empty directories. As a result, you may create an empty directory in a Git repository with intent to commit it into the Subversion repository, and find yourself unable to do so. The workaround here is to place some sort of file into the directory and commit that; a README file explaining the directory’s purpose seems to be a sensible choice.

These limitations, among others, show that the mapping from Subversion’s functionality to Git’s isn’t perfect, but for those who find working with Subversion a bit painful or imprecise, the functionality available in git-svn can certainly help, until such time as you’re able to convince your repository’s host to migrate everything to a more modern and capable revision control system.