subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Fuhrmann <stefan.fuhrm...@wandisco.com>
Subject Re: Bulk copying revprops
Date Wed, 05 Aug 2015 12:25:11 GMT
On Wed, Aug 5, 2015 at 1:05 PM, Ivan Zhakov <ivan@visualsvn.com> wrote:

> On 24 July 2015 at 22:58, Philip Martin <philip.martin@wandisco.com>
> wrote:
> > [Arising from some discussion on IRC today.]
> >
> > I've been considering the problem of a dump/load upgrade for a
> > repository with a large number of revisions.  To minimise downtime the
> > initial dump/load would be carried out while the original repository
> > remains live.  When the load finishes the new repository is already
> > out-of-date so an incremental dump/load is carried out.  When this
> > second load finishes the original repository is taken offline and we
> > want to bring the new repository online as quickly as possible.  A final
> > incremental dump/load is required but that only involves a small number
> > of revisions and so is fast.  The remaining problems are locks and
> > revprops.
> >
> > We do not have tools to handle locks so the options are: a) drop all the
> > locks, or b) copy/move the whole db/locks subdir.  I'm not really
> > interested in locks at present.
> >
> > Revprops are more of a problem.  Most revprops are up-to-date but a
> > small number may be out-of-date.  The problem is we do not know which
> > revprops are out-of-date.  Is there a reliable and efficient way to
> > bring the revprops up-to-date?  We could attempt to disable and/or track
> > revprop changes during the load but this is not reliable.  Post- hooks
> > are not 100% reliable and revprop changes can bypass the hooks.  We
> > could attempt to copy/move the whole revprops subdir that is not always
> > possible if the repository formats are different.
> >
> > One general solution is to use svnsync to bulk copy the revprops:
> >
> >   ln -sf /bin/true dst/hooks/pre-revprop-change
> >   svnsync initialize --allow-non-empty file:///src file:///dst
> >   svnsync copy-revprops file:///src file:///dst
> >
> > This isn't very fast, I get about 2,000 revisions a minute for
> > repositories on an SSD.  There are typically three revprops per
> > revisions and the FS/RA API change one at time.  Each change must run
> > the mandatory pre-revprop-change hook and fsync() the repository.
> > svnsync has a simple algorithm that writes every revprop for each
> > revision.
> >
> > A repository with a million revisions svnsync would invoke three million
> > processes to run the hooks and three million fsync().  Typically, most
> > of this work is useless because most of the revprops already match.
> >
> > I wrote a script using the Python FS bindings (see below). This avoids
> > the hooks and also elides the writes when the values already match.
> > Typically this just has to read and so will process several hundred
> > thousand revisions a minute.  This will reliably update a million
> > revisions in minutes.
> >
> > I was thinking that perhaps we ought to provide a more accessible way to
> > do this.  First, modify the FS implementations to detect when a change
> > is a noop that doesn't modify a value and skip all the writing.  Second
> > provide some new admin commands to dump/load revprops:
> >
> >   svnadmin dump-revprops repo | svnadmin load-revprops repo
> >
> May be use existing 'load' subcommand with '--revprops-only' switch to
> load revprops instead of new subcommand? I.e.:
>   svnadmin dump --revprops-only | svnadmin load --revprops-only
>

Yeah, I had thought about this. For the dump side, it makes
some sense as it does not fundamentally change the semantics
of the dump command. My current implementation actually uses
a bumped version of the dump_fs API for it.

Load, OTOH, behaves very differently from load-revprops: It adds
revisions (works based upon transactions) instead of modifying
existing ones, sends different notifications etc. So, it seems cleaner
to have a separate sub-command. For symmetry, also having a
separate dump-revprops sub-command seems to be a better approach.

But that's all up for discussion and should be easy to change
in the code.

-- Stefan^2.

Mime
View raw message