subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Foad <>
Subject Re: #4667, Merge uses large amount of memory
Date Wed, 04 Jan 2017 15:02:09 GMT
Stefan Fuhrmann wrote:
> Julian Foad wrote:
>> The branches involved have subtree mergeinfo on over 3500 files, each referring
>> to about 350 branches on average, and just over 1 revision range on average per
>> mergeinfo line. Average path length is under 100 bytes.
> What is the result of 'svn pg "svn:mergeinfo" -R | wc -c'?

120 MB.

> > [...]
> > tools "svn-mergeinfo-normalizer" and "" both also fail to
> > execute in the available RAM.
> You may run svn-mergeinfo-normalizer on arbitrary sub-trees.

Yes, and I may explore this further. I will note that we're already 
dealing with a subtree (the attempted merges and the mergeinfo reported 
above all refer to a subtree of the entire branch) as a whole-branch 
merge had become impossible since some time ago.

> A lot of memory will be used to hold that part of the repository
> history that is relevant to the branches mentioned in the m/i.
> This may easily grow to several GB if there have been tens of
> millions of changes.

The number of revisions in the repository is about 1 million.

> If the tool manages to read the mergeinfo, it will print m/i
> stats before fetching the log.  Does it get to this stage?

I'll see if I can find out.

>> I would like to try a different approach. We read, parse and store all the
>> mergeinfo, whereas I believe our merge algorithm is only interested in the
>> mergeinfo that refers to one of exactly two branches ('source' and 'target') in
>> a typical merge. The algorithm never searches the 'graph' of merge ancestry
>> beyond those two branches. We should be able to read, parse and store only the
>> mergeinfo we need.
> That seems to be the path to take.  I would have assumed that we only
> need the m/i for the source branch as the target m/i is implied as
> being all of the target history.
> > Another possible approach could be to store subtree mergeinfo in a "delta" form
> > relative to a parent path's mergeinfo.
> I can see two problems here.  First, you can only use the new scheme
> after all "relevant", i.e. merging, clients have been upgraded.

No, I meant just convert it to delta form when reading it into memory. I 
wasn't proposing a format change of the stored svn:mergeinfo property.

> More importantly, the in-memory data model would need to be something
> delta-like.  That sounds like a lot of code-churn.

Sure, not trivial!

Thanks for the interest.

- Julian

View raw message