The unified diff between revisions [f632db4f..] and [41f53cfd..] is displayed below. It can also be downloaded as a raw diff.
#
#
# patch "distributed-version-control-systems.texinfo"
# from [17bd90946c90a932145cda397d3cc10b6120d27f]
# to [c0c43937f06a6dbe286975ecbf8d34fda98315c1]
#
============================================================
--- distributed-version-control-systems.texinfo 17bd90946c90a932145cda397d3cc10b6120d27f
+++ distributed-version-control-systems.texinfo c0c43937f06a6dbe286975ecbf8d34fda98315c1
@@ -5,7 +5,7 @@
@afourpaper
@c %**end of header
-@set EDITION First Edition (2006-04-21)
+@set EDITION Second Edition (2006-11-16)
@ifinfo
@dircategory Programming
@@ -54,6 +54,7 @@ USA.
@ifnottex
@node Top
@top Distributed version control systems, a comparison
+@value{EDITION}
@insertcopying
@end ifnottex
@@ -65,7 +66,9 @@ USA.
* Bazaar-NG::
* Mercurial::
* Monotone::
+* Cogito::
* Summary::
+* Document history::
@end menu
@node Methodology
@@ -173,7 +176,7 @@ and 4.1), one revision and one tag for e
The repository will therefore contain four branches (trunk, 3.4, 4.0
and 4.1), one revision and one tag for each official release of GCC
-since 3.4, plus one tag for each branch point. The first rebision
+since 3.4, plus one tag for each branch point. The first revision
will correspond to revision 76006 in the upstream Subversion
repository; this is the branch point for the 3.4 branch.
@@ -185,6 +188,10 @@ Subversion.
branches exploded from 3.5 Gb to 8.5 Gb when GCC switched from CVS to
Subversion.
+If you would like to compare the repository size in your favourite
+version control systems, please contact me; I may able to provide the
+repository in one of the already-tested version control systems.
+
@node CVS
@chapter CVS
@@ -383,42 +390,48 @@ Time to perform a checkout is 15 minutes
Time to perform a checkout is 15 minutes.
@node Bazaar-NG
-@chapter Bazaar-NG 0.7
+@chapter Bazaar-NG
+This chapter covers Bazaar-NG 0.11.
+
@section Architecture
-One working copy = one branch = one repository; this is bad. There is
-a proposal to implement ``Repositories''.
+Starting with version 0.8, we can set up a ``Shared Repository'', say
+@code{/var/lib/bzr}, to contain history information for all branches.
+Then we can perform ``Lightweight checkouts'' into other directories,
+say @code{~/src/upstream} and @code{~/src/private}, much like with
+other version control systems. Thus, the concept of ``one working
+copy = one branch = one repository'' no longer applies.
-No concept of modules.
+No tags. This is bad. However the revision selector syntax makes it
+possible to select the common ancestor revision, so the need for tags
+is greatly reduced.
-No tags. Not possible to review past changes to the upstream branch,
-only the changes since the previous pull.
+The storage format has changed since 0.7, and will probably change
+again in the future.
-The storage format changes quite often. This is bad. The current
-format, ``weaves'' (already version 6 of the storage format), is not
-particularly efficient, but better than the first versions which
-stored a complete copy of every revision of every file. The future
-``knit'' format (version 7?) is more efficient but not released yet.
-
Serves repositories over HTTP, FTP, NFS, rsync, etc. This is good.
-Quite slow.
+Quite slow, despite the huge progress made since 0.7.
Excellent built-in documentation.
-The scenario requires two local branches, each with repository and
-working copy: upstream and private.
+The scenario requires two local branches: upstream and private, in the
+repository, and two lightweight checkouts.
Upstream cannot easily review my patches if I just push them to their
-branch; they need a special incoming branch, with repository and
-working copy, where I can push.
+branch; they need a special incoming branch where I can push.
@section Scenario
@example
-$ bzr branch http://bazaar.upstream.org/upstream
-$ bzr branch upstream private
+(create the repository:)
+$ mkdir /var/lib/bzr
+$ bzr init-repository /var/lib/bzr
+$ bzr branch http://bazaar.upstream.org/upstream /var/lib/bzr/upstream
+$ bzr checkout --lightweight /var/lib/bzr/upstream
+$ bzr branch /var/lib/bzr/upstream /var/lib/bzr/private
+$ bzr checkout --lightweight /var/lib/bzr/private
$ cd private
(edit)
@@ -453,8 +466,15 @@ $ bzr commit -m "private 4: merged from
(edit)
$ bzr commit -m "private 4: merged from upstream, and more edits."
+
@end example
+@section Storage efficiency
+
+The repository eats inodes like crazy; at least two files in the
+repository per file under version control, plus some constant
+overhead.
+
@node Mercurial
@chapter Mercurial 0.8.1
@@ -468,6 +488,9 @@ default is global.
i.e. replicated to other repositories by pulling or pushing. The
default is global.
+Cannot select the common ancestor of two revisions of a file on two
+branches.
+
Identifies changesets by their SHA1; records the ancestor of each
changeset.
@@ -575,6 +598,10 @@ servers, nor on SourceForge, Tigris, Ali
currently possible to host repositories on general-purpose HTTP
servers, nor on SourceForge, Tigris, Alioth, Berlios or the like.
+There is a contributed tool, ``usher'', that allows serving several
+databases on one port number. Usher comes bundled with the Monotone
+source tarballs, and as source in the monotone package for Debian.
+
The repository is a single file, containing an SQLite database. This
is good for backups, and SQLite has a command-line tool that allows
massaging the database in case of emergencies (I know SQL, so this
@@ -592,8 +619,13 @@ changes and commit to the upstream branc
Branching can only happen after making local changes; this is
counter-intuitive to me and error-prone; it is all too easy to make
changes and commit to the upstream branch, forgetting to create the
-new branch.
+new branch. To alleviate this, you can just edit @file{_MTN/options}
+in the working copy and change the branch before you make any changes.
+Cannot select the common ancestor of two revisions of a file on two
+branches (see @url{https://savannah.nongnu.org/bugs/?18302,Bug #18302
+on Savannah}), so manual tagging is sometimes required.
+
Requires all tags and commits to be signed by a RSA key; thus all
changes are authenticated. The key is special to Monotone; this is
good because it allows one to have many keys if desired, and does not
@@ -601,21 +633,35 @@ Requires an external editor that support
names are email addresses, but this can change.
Requires an external editor that supports 3-way merge, such as emacs,
-to resolve conflicts. When that editor exits, commits to the database
+to resolve conflicts. This is good because you cannot forget a
+conflict. When that editor exits, commits to the database
immediately; there is no way to do more changes before the commit.
This is bad. I would prefer it if, at least optionally, Monotone
-would just leave the results of diff3 and allow me to resolve
-conflicts and commit manually when I'm done.
+would allow additional edits before committing.
+Merging is a database-only operation; it does not require a working
+copy and does not use the working copy you're in, if any. As a
+consequence, you always commit before merging. If you come from other
+version control systems, this model may be counter-intuitive, but it
+is actually very good as it always preserves your unmerged sources in
+the database. To illustrate why this is so good, here is an excerpt
+from a post by Brian May on comp.lang.ada:
+
+@quotation
+Occasionally at my previous job (they used subversion) it was almost
+like a race to commit my changes first so I wouldn't have to deal with
+the conflicts. Didn't always work as planned though, as often the
+second person would accidently revert my changes by doing an update
+operation with the file still open in the editor. "Why did you revert
+my changes? The bug I fixed came back again. Did I break something?"
+"I didn't revert your change!" "Yes you did, in revision XYZ!"
+@end quotation
+
Average built-in documentation, on par with CVS but not better.
Compensated for by the excellent info manual.
Has built-in import commands from RCS and CVS that preserve history.
-Debian maintainer is MIA; latest release in Debian is 0.24, while
-latest upstream release is 0.26. Upstream however has good support for
-Debian and provides .deb files.
-
Has a built-in list of common file extensions to ignore, and ignores
them by default. However, the GCC sources contain a number of .a
files which are not archives; they are Ada source files that are part
@@ -771,25 +817,165 @@ compare them).
@section Storage efficiency for Monotone 0.26
+The storage format changed in 0.26.
+
The database takes 166 megabytes, or 65% less than Meta-CVS. This is
very good but comes at an unacceptable performance price. It takes 19
minutes to check out from the database, instead of 3.5 minutes with
version 0.24. During that time, Monotone eats 190 Mb of RAM and uses
all available CPU.
+UPDATE: this was due to a memory leak in Botan, a library that
+Monotone uses. The leak was introduced in 0.25 but became serious
+only in 0.26. It has been fixed in 0.30. As of 0.30, Monotone is
+even faster at checking out than 0.24. See
+@url{https://savannah.nongnu.org/bugs/?func=detailitem&item_id=16601,Bug
+#16601 on Savannah}.
+
+@node Cogito
+@chapter Cogito
+
+Cogito, by Petr Baudis and others, is a distributed version built on
+top of GIT, Linus Torvalds' tree history storage system. My thanks to
+Petr Baudis for this chapter; Petr sent me a long and detailed email
+with the full scenario below, and I was intrigued enough to fill in
+the missing parts. What follows applies to cogito 0.18.1 and git
+1.4.3.3.
+
+@section Architecture
+
+A working copy contains a repository, like in Mercurial or Bazaar-NG.
+This is bad, I like my repositories to be separate. In particular,
+this means that there can be only one working copy per repository (the
+working copy is the repository's parent directory). It is possibe,
+like in Bazaar-NG, to create a repository without a working copy, but
+then, unlike in Bazaar-NG, every working copy will also contain a full
+repository. Cloning a working copy+repository on a single filesystem
+uses hard links where possible, so clones are ``cheap'' with the same
+caveat as in Mercurial: any changes to a file in the repository would
+break the link and destroy the benefits of shared storage. I think
+that this is only relevant when using pack files.
+
+A repository can contain many branches; some branches are flagged as
+being mirrors of remote branches. Each branch remembers which remote
+branch it is a mirror of. A repository can contain mirrored branches
+from many other repositories. This is good. Of course, it is
+possible to "switch" the working copy from a branch to another.
+
+Cogito can push and pull over many protocols: local files, HTTP,
+rsync, SSH, etc.) This is good.
+
+Supports tags, branches, and intelligent merge. This is good. By
+default, tags are local, which means they are not replicated when
+pushing or pulling repositories.
+
+Can select the common ancestor of two revisions on different branches.
+This is very good, as it reduces the need for tagging in the first
+place.
+
+When merging, if Cogito detects conflicts, it refrains from committing
+but instead places conflict markers in the files, just like CVS. This
+is bad, because you may forget to resolve a conflict and commit.
+
+Git, the underlying storage engine, is a ``content-addressed
+filesystem'' where each revision (or delta) of each file, as well as
+each revision of the tree structure, is in a file whose name is the
+SHA-1 sum of the contents. This is similar to all the other systems,
+and guarantees integrity. Git allows to crypto-sign tags, too,
+thereby providing authentication. The authorisation mechanism relies
+on Unix file permissions in the repository.
+
+@section Scenario
+
+Thanks to Petr Baudis for this section. I have edited Petr's scenario
+to remove the use of tagging, which is unnecessary with a system that
+keeps track of branches and merges.
+
+Note that sometimes you might find something is ommitted; e.g.
+@code{commit} is missing @code{-m} or there's no @code{| less}; that's
+intentional because @code{less} will be called automagically or the
+commit message will be already prefilled (you'll still get to edit it
+if you wish).
+
+@example
+$ cg clone http://git.upstream.org/upstream
+$ cd upstream
+@end example
+
+This creates a working copy and a new repository containing two
+branches:
+
+@itemize @bullet
+@item
+@code{origin} which is equivalent to your @code{upstream} and always
+only mirrors exactly what is in the upstream repository when you last
+fetched ("pulled" in hg/mtn language) from it.
+
+@item
+@code{master} which is equivalent to your @code{private} branch, it is
+the branch that you have checked out and you commit to it
+@end itemize
+
+@example
+(edit)
+$ cg commit -m "private 1"
+
+(fetch from upstream:)
+$ cg fetch
+
+(review changes in origin since its last merge to master:)
+$ cg diff -m
+
+(merge into the master branch:)
+$ cg merge
+(resolve conflicts)
+@end example
+
+If there were no conflicts, @code{cg merge} will automatically do a
+commit. If there were no local changes in your @code{master} branch
+relative to the @code{origin} branch, there will be actually no commit
+at all.
+
+@example
+$ cg commit
+
+(edit)
+$ cg commit -m "private 3"
+
+(review my changes:)
+$ cg diff -r origin
+
+(push my changes upstream; Cogito remembers that 'origin' is
+associated with http://git.upstream.org/upstream:)
+$ cg push
+
+(get upstream 3 and merge it to my private branch at the same
+time; this is basically equiv. to cg-fetch && cg-merge:)
+$ cg update
+@end example
+
+@section Storage efficiency
+
+The repository, not counting the working copy, takes 409 megabytes
+after initial creation. After running @code{git repack -a -d} as
+recommended by Petr, which takes about half an hour, the repository
+takes 81 megabytes.
+
+If not packed, the repository eats inodes like crazy: one per commit
+per file, plus one per commit to the directory structure (add, remove,
+rename files or directories), plus one per branch, plus one per tag.
+
+Git compresses all files using zlib, like Mercurial and Monotone.
+
@node Summary
@chapter Summary
-Out of the three distributed version control systems, only one fully
-supports my current workflow: this is Monotone. It also fully
-supports the efficient storage afforded by multiple branches in a
-repository. In addition, it has quite advanced security mechanisms
-built-in. The only drawbacks are the need for an external editor for
-3-way merges, and the lack of local tags. Also, I will stick with
-version 0.24 until the checkout performance problem is resolved.
+Out of the four distributed version control systems, two fully support
+my current workflow: Monotone and Cogito. They also fully support the
+efficient storage afforded by multiple branches in a repository.
Bazaar-NG lacks tags, so it does not allow me to review past upstream
-changes. Also, it does not allow several branches per repository.
+changes between arbitrary points.
Mercurial lacks the ability to diff between branches. Also, it does
not allow several branches per repository. The hard-linking trick
@@ -797,4 +983,59 @@ Mercurial's storage in this case is less
almost all files change, but share a long common history on the trunk;
Mercurial's storage in this case is less efficient.
+Git lacks the ability to have multiple working copies from the same
+repository (each copy would get a copy of the repo). Copies are cheap
+(using hard links) only if on the same filesystem. However Git has
+the best storage efficiency around, if you use packing.
+
+Since July 2006 I have switched from Meta-CVS to Monotone 0.24, then
+0.28 for my personal projects. I'd be reluctant to change to a system
+that doesn't use the commit-before-merge model; currently Monotone is
+the only system that uses that model.
+
+@multitable {@strong{Version Control System}} {@strong{Size}} {@strong{tags}----------} {@strong{branches}} {@strong{repo}} {@strong{integ}} {@strong{sign}} {@strong{auth}}
+@item @strong{Version Control System} @tab @strong{Size} @tab @strong{tags} @tab @strong{branches} @tab @strong{repo} @tab @strong{integ} @tab @strong{sign} @tab @strong{auth}
+@item CVS @tab 474 @tab expensive @tab yes @tab yes @tab no @tab no @tab no
+@item Meta-CVS @tab 474 @tab expensive @tab yes @tab yes @tab no @tab no @tab no
+@item Bazaar-NG 0.11 @tab ? @tab no @tab yes @tab yes @tab no @tab yes @tab no
+@item Mercurial 0.9 @tab 686 @tab cheap, local @tab no @tab no @tab yes @tab no @tab no
+@item Monotone 0.24-0.25 @tab 183 @tab cheap @tab yes @tab yes @tab yes @tab yes @tab yes
+@item Monotone 0.26-0.29 @tab 165 @tab cheap @tab yes @tab yes @tab yes @tab yes @tab yes
+@item Monotone 0.30-0.31 @tab 160 @tab cheap @tab yes @tab yes @tab yes @tab yes @tab yes
+@item Git 1.4.3.3 (unpacked) @tab 409 @tab cheap, local @tab yes @tab no @tab yes @tab yes @tab no
+@item Git 1.4.3.3 (packed) @tab 81 @tab cheap, local @tab yes @tab no @tab yes @tab yes @tab no
+@end multitable
+
+@table @strong
+@item Size
+The size of the repository containing the GCC sources, in megabytes
+(less is better).
+@item branches
+Supports multiple branches in a single repository thereby sharing
+storage.
+@item repo
+Supports multiple working copies per repository, thereby sharing
+storage.
+@item integ
+Guarantees data integrity by means of strong hashes, e.g. SHA-1.
+@item sign
+Allows for crypto signatures for authentication.
+@item auth
+Built-in authorisation framework (if not included, must use external
+tools like SSH or file permissions).
+@end table
+
+@node Document history
+@appendix Document history
+
+@table @dfn
+@item First Edition, 2006-04-21.
+Covers CVS, Meta-CVS, Bazaar-NG 0.7, Mercurial 0.8, Monotone 0.24 and
+0.26.
+
+@item Second Edition, 2006-11-16.
+Update to Bazaar-NG 0.11, Monotone 0.30; add Cogito 0.18.1+Git 1.4.3.3.
+Add summary.
+@end table
+
@bye