In the 1990s and early 2000s, the Concurrent Versions System (CVS) was one of the most popular systems to manage source code repositories. However, the software was replaced by Subversion (SVN) in the middle of the 2000s. The development of the original CVS implementation was ended in 2008. Since around 2010, git is one of the most popular systems in this field. This article deals with the challenge of migrating a legacy CVS repository to git in the year 2022.

An easy solution for the migration might be to checkout the latest version of the repository, take the files and create a new git repository from them. This might work, but the entire history (e.g., old file revisions, commit messages, branches, tags) are lost. In this howto, the history of the repository is also converted.

This howto consists of two parts. In the first part, the existing CVS repository is converted into git in one bulk operation. In the second part, changes committed in CVS after the bulk migration are incrementally merged into the git repository. This might be useful when you migrate the repository and the actual development still uses the CVS repository and you need to get the most recent changes into git. I have used this for several weeks. The git repository was used as a read-only repository, while the actual development still uses CVS. After the git repository works and the build pipeline works as expected, the CVS server was disabled and the development uses git directly.

Prerequisite

To perform the migration, direct file system access to the CVS repository is needed. The direct access is needed to access all old revisions of the contained files. I have used a Debian 10 system for performing the migration. To perform the bulk migration, I used the tool cvs2git, which is part of the cvs2svn package. To perform the incremental updates, I used the git cvs command, which is part of the git-cvs package. Both packages can be installed by using the following command:

apt install cvs2svn git-cvs

Bulk Migration of the Existing CVS Repository

In the first step, the current version of the raw CVS repository needs to be copied to the local system. It is assumed that the CVS repository is located in the directory mysoftware-cvsrepo. Before the migration can be performed, a configuration for cvs2git has to be created. A template for the migration can be found in the directory /usr/share/doc/cvs2svn/examples/. This template can be used and adjusted.

cp /usr/share/doc/cvs2svn/examples/cvs2git-example.options.gz .
gunzip cvs2git-example.options.gz

In CVS commits belong to the username of the CVS-Server. In git, commits are associated to an e-mail address. To assign the old commits to the proper git user, a mapping can be created in the file (see the author_transforms setting). However, this is an optional step. Without the mapping, the repository can still be converted. At least the location of the CVS repository has to be adjusted in run_options.set_project(. The first value of the call has to point to the directory mysoftware-cvsrepo. Afterward, the migration can be performed as follows:

cvs2git --options=cvs2git-example.options --fallback-encoding utf8

cvs2svn Statistics:
------------------
Total CVS Files:             18490
Total CVS Revisions:         72288
Total CVS Branches:           9276
Total CVS Tags:             282034
Total Unique Tags:             134
Total Unique Branches:           9
CVS Repos Size in KB:      1605937
Total SVN Commits:           25233
First Revision Date:    Wed May  1 21:26:44 2002
Last Revision Date:     Fri Jan  7 23:37:53 2022
------------------
Timings (seconds):
------------------
  35   pass1    CollectRevsPass
   1   pass2    CleanMetadataPass
   0   pass3    CollateSymbolsPass
1066   pass4    FilterSymbolsPass
   1   pass5    SortRevisionsPass
   1   pass6    SortSymbolsPass
  19   pass7    InitializeChangesetsPass
  11   pass8    BreakRevisionChangesetCyclesPass
  11   pass9    RevisionTopologicalSortPass
  13   pass10   BreakSymbolChangesetCyclesPass
  21   pass11   BreakAllChangesetCyclesPass
  21   pass12   TopologicalSortPass
  20   pass13   CreateRevsPass
   2   pass14   SortSymbolOpeningsClosingsPass
   2   pass15   IndexSymbolsPass
  26   pass16   OutputPass
1251   total

The command might take a while and prints a few statistics about the migration. In this example, a 20-year-old repository was migrated and 25233 commits were created. The result of the operation are two files git-blob.dat and git-dump.dat in the directory cvs2git-tmp. The files contain all the needed data to populate a new git repository with all files and the entire project history. This can be done with the following commands:

mkdir gitrepo
cd gitrepo
git init .
cat ../cvs2git-tmp/git-{blob,dump}.dat | git fast-import

/usr/lib/git-core/git-fast-import statistics:
---------------------------------------------------------------------
Allocd objects:     155000
Total objects:       151974 (      7423 duplicates                  )
      blobs  :        60702 (      6222 duplicates      55451 deltas of      59993 attempts)
      trees  :        71109 (      1201 duplicates      65918 deltas of      67925 attempts)
      commits:        20163 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:         148 (        15 loads     )
      marks:     1073741824 (     87087 unique    )
      atoms:          13092
Memory total:         11376 KiB
       pools:          4110 KiB
     objects:          7265 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 35184372088832
pack_report: pack_used_ctr            =      21471
pack_report: pack_mmap_calls          =        221
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  489804382 /  489804382
---------------------------------------------------------------------

Also the git import command will show some staticstics. Afterward, the git repository is ready. The current directory should contain all the files of the latest CVS checkout. In addition, commands git log can be used to view the commit history. The local git repository can now be pushed to services like GitHub. For example, as follows:

git remote add origin git@github.com:jnidzwetzki/mysoftware.git
git branch -M main
git push -u origin main
git push origin --tags

Import Incremental Updates into Git

After the CVS repository was bulk converted to git, some commits might be performed in the existing CVS repository. To import the last changes from SVN into git without repeating the whole migration, the command git cvsimport can be used. This command requires two things: (1) an optional mapping between CVS accounts and e-mail addresses and (2) access to the SVN server.

As in the bulk migration, the mapping is optional. A file has to be created which has lines in the format cvs-user=email. In my example, the file looks as follows:

cat commiter-mapping

nidzwetzki=Jan Nidzwetzki <jnidzwetzki@gmx.de>
[...]

Afterward, the incremental migration can be performed as follows:

cd gitrepo
git cvsimport -v -a -A ../commiter-mapping -d :pserver:username@cvsserver:2401/cvs mysoftware -o main
git push