Speeding up the initial git-svn fetch

0 votes
asked Oct 13, 2010 by mrevil

I have a big repository, 100,000+ revisions with a very high branching factor. The initial fetch of the full SVN repository using git-svn has been running for around 2 months and it's only up to revision 60,000. Is there any way to speed this thing up?

I'm already regularly killing and restarting the fetch due to git-svn leaking memory like a sieve. The transfer is occurring over the local LAN, so link speed shouldn't be an issue. The repository is on a dedicated machine backed by dedicated fiber channel arrays so the server should have plenty of oomph. The only other thing that I can think of is do the clone from a local copy of the SVN repository.

What have other people done in similar circumstances?

8 Answers

0 votes
answered Jan 13, 2010 by kevpie

I think you are on the right track

Local file access could give you 1 to 2 order speedup.

Not sure if running git svn against a bdb or files based svn backend would be faster.

0 votes
answered Jan 13, 2010 by daniel-stutzbach

I've downloaded a close-to-100,000-revision SVN repository using git-svn before. It took around 48 hours and was not over a local LAN. Admittedly, you did say that your repository has a high branching factor, while the repository I downloaded did not (although it did have several dozen branches)

I'd suggest working on figuring out where the bottleneck lies. Are git-svn and its subprocesses using 100% CPU? Are the disk lights on the client or the SVN server constantly lit? How much bandwidth is being used? Once you know what the limiting factor is, you can work on figuring out how to fix it.

0 votes
answered Oct 20, 2010 by ben-jackson

At work I use git-svn against a ~170000 revision SVN repo. What I did was use git-svn init + git-svn fetch -r... to limit my initial fetch to a reasonable number of revisions. You must be careful to choose a revision that is actually in the branch you want. Everything is fully functional even with truncated history except git-blame, which obviously attributes all the lines older than your starting rev to the first rev.

You can further speed this up with ignore-paths to prune out subtrees that you don't want.

You can add more revisions later, but it will be painful. You will have to reset the rev-map (sadly I even wrote git-svn reset and I can't say offhand if it will remove all revisions, so it may be by hand). Then git-svn fetch more revisions and git-filter-branch to reparent your old root to the new tree. That will rewrite every commit but it won't affect the source blobs themselves. You have to do similar surgery when people undertake big reorgs of the svn repo.

If you actually need all of the revisions (for example for a migration) then you should be looking at some flavor of svn-fast-export + git-fast-import. There may be one that adds rev tags to match git-svn, in which case you could fast-import and then just graft in the svn remote. Even if the existing svn-fast-export options don't have that feature, you can probably add it before your original clone completes!

0 votes
answered Oct 22, 2011 by mrevil

Apparently there is no good answer. Some work is being done on git-fast-import but it isn't ready for prime time yet. They are still trying to figure out how to detect and represent 'svn cp' actions. The one bright spot is that someone on the list came up with an optimization for git-svn that seems to have made a big impact.

http://permalink.gmane.org/gmane.comp.version-control.git/168718

0 votes
answered Jan 20, 2015 by tobias-tobiasen

In a repository with 20k commits I had similar problems. In my case it turned out that there was a few strange tags in subversion that caused problems. There was tags that copied / instead of /trunk. That cause git svn fetch to go into infinite loop. I fixed it by converting in chunks.

git svn fetch -r0:1000
git svn fetch -r0:2000
git svn fetch -r0:3000

Watch the output and if you don't see new r... once in a while then something is wrong. Use git log --all to see how far the conversion got. Let say you got to 1565. Then continue the fetch like this.

git svn fetch -r1567:2000

It was very tedious but it got the job done.

0 votes
answered Jan 19, 2016 by bengineerd

If you can find a server with enough RAM, do the whole clone operation on a ramdisk. On Linux systems you can use /dev/shm, which is backed by RAM.

> svnadmin hotcopy /path/to/svn/repo /dev/shm/svn-repo

> git svn clone file:///dev/shm/svn-repo /dev/shm/git-repo

Once that's done, you can point the git repo back to your real svn repo instead as described here: https://git.wiki.kernel.org/index.php/GitSvnSwitch

  • Edit the svn-remote url URL in .git/config to point to the new domain name
  • Run git svn fetch - This needs to fetch at least one new revision from svn!
  • Change svn-remote url back to the original url
  • Run git svn rebase -l to do a local rebase (with the changes that came in with the last fetch operation)
  • Change svn-remote url back to the new url
  • Run git svn rebase should now work again!

This will only work, if the git svn fetch step actually fetches anything! (Took me a while to discover that... I had to put in a dummy revision to our svn repository to make it happen!)

I just did this and was able to clone a 4.7G 12000 revision svn repo to git in about 3 hours.

0 votes
answered Sep 15, 2017 by wollow

I have a repo with 8k+ reviews and around 240 tags. I tried to run and estimated that my intial git svn clone on windows would have taken months, simply doing

git svn clone --stdlayout --no-metadata --authors-file=users.txt https://link.to.repo

The clone was was taking 5 seconds to import 1 revision on average. Please notice that whenever a tag is encountered, the clone restarts from rev 1, so potentially there are 8k * 240 operations = 111 days

Summary of my all the steps I took to speed up the process:

  1. linux and osx implementation are much faster than cygwin on windows. I used a linux virtual machine. Please check https://stackoverflow.com/a/21599759/1448276

  2. I copied the entire svn repo to my machine with svnrdump

svnrdump dump https://link.to.repo > repos.dump

  1. I created a local svn repo

    svnadmin create svnrepo

    svnadmin load svnrepo < repos.dump

as in https://stackoverflow.com/a/10407464/1448276

  1. I created and mounted a ram based disk

    svnadmin hotcopy svnrepo/ /dev/shm/svnrepo

as above, https://stackoverflow.com/a/39030862/1448276

  1. And finally ran the clone

    git svn clone --stdlayout --no-metadata --prefix=origin/ --authors-file=users.txt file:///dev/shm/svnrepo

Here the clone is processing on average 12.5 revisions per second, so I expect it will take less than 2 days. I'll post an update once the clone is complete.

0 votes
answered Sep 15, 2017 by timb33

2017 calling in. I'm migrating a 45k revision repo and I'm finding git-svn on Linux working about 10x faster than git-svn on my windows box. The Vm is on the same HyperV as my svn repo so it could be that.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...