Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE59E18800 for ; Wed, 5 Aug 2015 12:26:23 +0000 (UTC) Received: (qmail 33880 invoked by uid 500); 5 Aug 2015 12:26:12 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 33831 invoked by uid 500); 5 Aug 2015 12:26:12 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 33810 invoked by uid 99); 5 Aug 2015 12:26:12 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2015 12:26:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B029419B75A for ; Wed, 5 Aug 2015 12:26:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=wandisco.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 6eyPY3vY97Oa for ; Wed, 5 Aug 2015 12:25:58 +0000 (UTC) Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id BBB22428DB for ; Wed, 5 Aug 2015 12:25:57 +0000 (UTC) Received: by igbpg9 with SMTP id pg9so113203697igb.0 for ; Wed, 05 Aug 2015 05:25:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wandisco.com; s=gapps; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EDExACyQhd54dPSdrgaDQIrbeXLdvSCnkf+6smu/sHo=; b=W34Sor8zJBiKROzmHMl08PFZGxumJdIamuTcSZV7udPstjvTtcpm6NqSTwLd9BktFs +B/zReRG/LFTcZjVlhG05ZLxOJChxP0z8efJ3qdh7G4mChO7phLJmRWEdK5kBn2zruKe 5JfK7SgFWKbKhKh8byvbC8qdv0qkDqUCXLI3s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=EDExACyQhd54dPSdrgaDQIrbeXLdvSCnkf+6smu/sHo=; b=gOn79dWnj785zbHRU1UJpX8oZwA0hJW646rBZ2b3OCqkhurHQbIuL/krF48EqdAQLQ iXwTyie/0xsDB17QZ30LlT/icB+gs1qRjU1gAwCaUvtVg8E3d8hUHdfey5oAQN2XngxA T6NN6Il0Npi2ZF6ujMemhiMhI8qW0iTisriojNV1uCMhLUAAxuuOEArb3QHmzoXYXo/c 5egBVGuLEsyCAFzqv9StIRFttAuVY0/eJc/wISNLyqrqLdbkMwMgF5F+j1gWKbSUWJ3n 1oqRNyprBVGCExoP3804F0ThqXZV9x51ZFoNwqrs1i09O0Kb6TH5AR1moHLknObtkvAl n/eA== X-Gm-Message-State: ALoCoQmn+Q4GnZL7F606E9k7/tGUqv+Xm+T7eQRs4aNI39b84xsqlZeV+gsVmL6A7nxDV98pAolV MIME-Version: 1.0 X-Received: by 10.50.30.105 with SMTP id r9mr5594812igh.11.1438777511285; Wed, 05 Aug 2015 05:25:11 -0700 (PDT) Received: by 10.50.250.141 with HTTP; Wed, 5 Aug 2015 05:25:11 -0700 (PDT) In-Reply-To: References: <871tfxnx9b.fsf@wandisco.com> Date: Wed, 5 Aug 2015 13:25:11 +0100 Message-ID: Subject: Re: Bulk copying revprops From: Stefan Fuhrmann To: Ivan Zhakov Cc: Philip Martin , Subversion Development Content-Type: multipart/alternative; boundary=e89a8f64277a78b51c051c8f7e49 --e89a8f64277a78b51c051c8f7e49 Content-Type: text/plain; charset=UTF-8 On Wed, Aug 5, 2015 at 1:05 PM, Ivan Zhakov wrote: > On 24 July 2015 at 22:58, Philip Martin > wrote: > > [Arising from some discussion on IRC today.] > > > > I've been considering the problem of a dump/load upgrade for a > > repository with a large number of revisions. To minimise downtime the > > initial dump/load would be carried out while the original repository > > remains live. When the load finishes the new repository is already > > out-of-date so an incremental dump/load is carried out. When this > > second load finishes the original repository is taken offline and we > > want to bring the new repository online as quickly as possible. A final > > incremental dump/load is required but that only involves a small number > > of revisions and so is fast. The remaining problems are locks and > > revprops. > > > > We do not have tools to handle locks so the options are: a) drop all the > > locks, or b) copy/move the whole db/locks subdir. I'm not really > > interested in locks at present. > > > > Revprops are more of a problem. Most revprops are up-to-date but a > > small number may be out-of-date. The problem is we do not know which > > revprops are out-of-date. Is there a reliable and efficient way to > > bring the revprops up-to-date? We could attempt to disable and/or track > > revprop changes during the load but this is not reliable. Post- hooks > > are not 100% reliable and revprop changes can bypass the hooks. We > > could attempt to copy/move the whole revprops subdir that is not always > > possible if the repository formats are different. > > > > One general solution is to use svnsync to bulk copy the revprops: > > > > ln -sf /bin/true dst/hooks/pre-revprop-change > > svnsync initialize --allow-non-empty file:///src file:///dst > > svnsync copy-revprops file:///src file:///dst > > > > This isn't very fast, I get about 2,000 revisions a minute for > > repositories on an SSD. There are typically three revprops per > > revisions and the FS/RA API change one at time. Each change must run > > the mandatory pre-revprop-change hook and fsync() the repository. > > svnsync has a simple algorithm that writes every revprop for each > > revision. > > > > A repository with a million revisions svnsync would invoke three million > > processes to run the hooks and three million fsync(). Typically, most > > of this work is useless because most of the revprops already match. > > > > I wrote a script using the Python FS bindings (see below). This avoids > > the hooks and also elides the writes when the values already match. > > Typically this just has to read and so will process several hundred > > thousand revisions a minute. This will reliably update a million > > revisions in minutes. > > > > I was thinking that perhaps we ought to provide a more accessible way to > > do this. First, modify the FS implementations to detect when a change > > is a noop that doesn't modify a value and skip all the writing. Second > > provide some new admin commands to dump/load revprops: > > > > svnadmin dump-revprops repo | svnadmin load-revprops repo > > > May be use existing 'load' subcommand with '--revprops-only' switch to > load revprops instead of new subcommand? I.e.: > svnadmin dump --revprops-only | svnadmin load --revprops-only > Yeah, I had thought about this. For the dump side, it makes some sense as it does not fundamentally change the semantics of the dump command. My current implementation actually uses a bumped version of the dump_fs API for it. Load, OTOH, behaves very differently from load-revprops: It adds revisions (works based upon transactions) instead of modifying existing ones, sends different notifications etc. So, it seems cleaner to have a separate sub-command. For symmetry, also having a separate dump-revprops sub-command seems to be a better approach. But that's all up for discussion and should be easy to change in the code. -- Stefan^2. --e89a8f64277a78b51c051c8f7e49 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Wed, Aug 5, 2015 at 1:05 PM, Ivan Zhakov <ivan@visualsvn.com&g= t; wrote:
On 24 July 2015 at 22:58, Philip Martin <philip.martin@wandisco.com> wrote: > [Arising from some discussion on IRC today.]
>
> I've been considering the problem of a dump/load upgrade for a
> repository with a large number of revisions.=C2=A0 To minimise downtim= e the
> initial dump/load would be carried out while the original repository > remains live.=C2=A0 When the load finishes the new repository is alrea= dy
> out-of-date so an incremental dump/load is carried out.=C2=A0 When thi= s
> second load finishes the original repository is taken offline and we > want to bring the new repository online as quickly as possible.=C2=A0 = A final
> incremental dump/load is required but that only involves a small numbe= r
> of revisions and so is fast.=C2=A0 The remaining problems are locks an= d
> revprops.
>
> We do not have tools to handle locks so the options are: a) drop all t= he
> locks, or b) copy/move the whole db/locks subdir.=C2=A0 I'm not re= ally
> interested in locks at present.
>
> Revprops are more of a problem.=C2=A0 Most revprops are up-to-date but= a
> small number may be out-of-date.=C2=A0 The problem is we do not know w= hich
> revprops are out-of-date.=C2=A0 Is there a reliable and efficient way = to
> bring the revprops up-to-date?=C2=A0 We could attempt to disable and/o= r track
> revprop changes during the load but this is not reliable.=C2=A0 Post- = hooks
> are not 100% reliable and revprop changes can bypass the hooks.=C2=A0 = We
> could attempt to copy/move the whole revprops subdir that is not alway= s
> possible if the repository formats are different.
>
> One general solution is to use svnsync to bulk copy the revprops:
>
>=C2=A0 =C2=A0ln -sf /bin/true dst/hooks/pre-revprop-change
>=C2=A0 =C2=A0svnsync initialize --allow-non-empty file:///src file:///d= st
>=C2=A0 =C2=A0svnsync copy-revprops file:///src file:///dst
>
> This isn't very fast, I get about 2,000 revisions a minute for
> repositories on an SSD.=C2=A0 There are typically three revprops per > revisions and the FS/RA API change one at time.=C2=A0 Each change must= run
> the mandatory pre-revprop-change hook and fsync() the repository.
> svnsync has a simple algorithm that writes every revprop for each
> revision.
>
> A repository with a million revisions svnsync would invoke three milli= on
> processes to run the hooks and three million fsync().=C2=A0 Typically,= most
> of this work is useless because most of the revprops already match. >
> I wrote a script using the Python FS bindings (see below). This avoids=
> the hooks and also elides the writes when the values already match. > Typically this just has to read and so will process several hundred > thousand revisions a minute.=C2=A0 This will reliably update a million=
> revisions in minutes.
>
> I was thinking that perhaps we ought to provide a more accessible way = to
> do this.=C2=A0 First, modify the FS implementations to detect when a c= hange
> is a noop that doesn't modify a value and skip all the writing.=C2= =A0 Second
> provide some new admin commands to dump/load revprops:
>
>=C2=A0 =C2=A0svnadmin dump-revprops repo | svnadmin load-revprops repo<= br> >
May be use existing 'load' subcommand with '--revpr= ops-only' switch to
load revprops instead of new subcommand? I.e.:
=C2=A0 svnadmin dump --revprops-only | svnadmin load --revprops-only

Yeah, I had thought about this. For the dump = side, it makes
some sense as it does not fundamentally change the semant= ics
of the dump command. My current implementation actually u= ses
a bumped version of the dump_fs API for it.

Load, OTOH, behaves very differently from load-revprops: It= adds
revisions (works based upon transactions) instead of mo= difying
existing ones, sends different notifications etc. So, it seems c= leaner
to have a separate sub-command. For symmetry, also hav= ing a
separate dump-revprops sub-command seems to be a better approach.<= br>

But that's all up f= or discussion and should be easy to change
in the code.

-- Stefan^2.
--e89a8f64277a78b51c051c8f7e49--