In the 1990s and early 2000s, the Concurrent Versions System (CVS) was one of the most popular systems to manage source code repositories. However, the software was replaced by Subversion (SVN) in the middle of the 2000s. The development of the original CVS implementation was ended in 2008. Since around 2010, git is one of the most popular systems in this field. This article deals with the challenge of migrating a legacy CVS repository to git in the year 2022.
An easy solution for the migration might be to checkout the latest version of the repository, take the files and create a new git repository from them. This might work, but the entire history (e.g., old file revisions, commit messages, branches, tags) are lost. In this howto, the history of the repository is also converted.
This howto consists of two parts. In the first part, the existing CVS repository is converted into git in one bulk operation. In the second part, changes committed in CVS after the bulk migration are incrementally merged into the git repository. This might be useful when you migrate the repository and the actual development still uses the CVS repository and you need to get the most recent changes into git. I have used this for several weeks. The git repository was used as a read-only repository, while the actual development still uses CVS. After the git repository works and the build pipeline works as expected, the CVS server was disabled and the development uses git directly.
Prerequisite
To perform the migration, direct file system access to the CVS repository is needed. The direct access is needed to access all old revisions of the contained files. I have used a Debian 10 system for performing the migration. To perform the bulk migration, I used the tool cvs2git
, which is part of the cvs2svn
package. To perform the incremental updates, I used the git cvs
command, which is part of the git-cvs
package. Both packages can be installed by using the following command:
apt install cvs2svn git-cvs
Bulk Migration of the Existing CVS Repository
In the first step, the current version of the raw CVS repository needs to be copied to the local system. It is assumed that the CVS repository is located in the directory mysoftware-cvsrepo
. Before the migration can be performed, a configuration for cvs2git
has to be created. A template for the migration can be found in the directory /usr/share/doc/cvs2svn/examples/
. This template can be used and adjusted.
cp /usr/share/doc/cvs2svn/examples/cvs2git-example.options.gz .
gunzip cvs2git-example.options.gz
In CVS
commits belong to the username of the CVS-Server. In git, commits are associated to an e-mail address. To assign the old commits to the proper git user, a mapping can be created in the file (see the author_transforms
setting). However, this is an optional step. Without the mapping, the repository can still be converted. At least the location of the CVS repository has to be adjusted in run_options.set_project(
. The first value of the call has to point to the directory mysoftware-cvsrepo
. Afterward, the migration can be performed as follows:
cvs2git --options=cvs2git-example.options --fallback-encoding utf8
cvs2svn Statistics:
------------------
Total CVS Files: 18490
Total CVS Revisions: 72288
Total CVS Branches: 9276
Total CVS Tags: 282034
Total Unique Tags: 134
Total Unique Branches: 9
CVS Repos Size in KB: 1605937
Total SVN Commits: 25233
First Revision Date: Wed May 1 21:26:44 2002
Last Revision Date: Fri Jan 7 23:37:53 2022
------------------
Timings (seconds):
------------------
35 pass1 CollectRevsPass
1 pass2 CleanMetadataPass
0 pass3 CollateSymbolsPass
1066 pass4 FilterSymbolsPass
1 pass5 SortRevisionsPass
1 pass6 SortSymbolsPass
19 pass7 InitializeChangesetsPass
11 pass8 BreakRevisionChangesetCyclesPass
11 pass9 RevisionTopologicalSortPass
13 pass10 BreakSymbolChangesetCyclesPass
21 pass11 BreakAllChangesetCyclesPass
21 pass12 TopologicalSortPass
20 pass13 CreateRevsPass
2 pass14 SortSymbolOpeningsClosingsPass
2 pass15 IndexSymbolsPass
26 pass16 OutputPass
1251 total
The command might take a while and prints a few statistics about the migration. In this example, a 20-year-old repository was migrated and 25233 commits were created. The result of the operation are two files git-blob.dat
and git-dump.dat
in the directory cvs2git-tmp
. The files contain all the needed data to populate a new git repository with all files and the entire project history. This can be done with the following commands:
mkdir gitrepo
cd gitrepo
git init .
cat ../cvs2git-tmp/git-{blob,dump}.dat | git fast-import
/usr/lib/git-core/git-fast-import statistics:
---------------------------------------------------------------------
Allocd objects: 155000
Total objects: 151974 ( 7423 duplicates )
blobs : 60702 ( 6222 duplicates 55451 deltas of 59993 attempts)
trees : 71109 ( 1201 duplicates 65918 deltas of 67925 attempts)
commits: 20163 ( 0 duplicates 0 deltas of 0 attempts)
tags : 0 ( 0 duplicates 0 deltas of 0 attempts)
Total branches: 148 ( 15 loads )
marks: 1073741824 ( 87087 unique )
atoms: 13092
Memory total: 11376 KiB
pools: 4110 KiB
objects: 7265 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit = 35184372088832
pack_report: pack_used_ctr = 21471
pack_report: pack_mmap_calls = 221
pack_report: pack_open_windows = 1 / 1
pack_report: pack_mapped = 489804382 / 489804382
---------------------------------------------------------------------
Also the git import
command will show some staticstics. Afterward, the git repository is ready. The current directory should contain all the files of the latest CVS checkout. In addition, commands git log
can be used to view the commit history. The local git repository can now be pushed to services like GitHub. For example, as follows:
git remote add origin git@github.com:jnidzwetzki/mysoftware.git
git branch -M main
git push -u origin main
git push origin --tags
Import Incremental Updates into Git
After the CVS repository was bulk converted to git, some commits might be performed in the existing CVS repository. To import the last changes from SVN into git without repeating the whole migration, the command git cvsimport
can be used. This command requires two things: (1) an optional mapping between CVS accounts and e-mail addresses and (2) access to the SVN server.
As in the bulk migration, the mapping is optional. A file has to be created which has lines in the format cvs-user=email
. In my example, the file looks as follows:
cat commiter-mapping
nidzwetzki=Jan Nidzwetzki <jnidzwetzki@gmx.de>
[...]
Afterward, the incremental migration can be performed as follows:
cd gitrepo
git cvsimport -v -a -A ../commiter-mapping -d :pserver:username@cvsserver:2401/cvs mysoftware -o main
git push