Return-Path: X-Original-To: apmail-subversion-dev-archive@minotaur.apache.org Delivered-To: apmail-subversion-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3718E17899 for ; Tue, 6 Oct 2015 09:38:18 +0000 (UTC) Received: (qmail 34131 invoked by uid 500); 6 Oct 2015 09:38:14 -0000 Delivered-To: apmail-subversion-dev-archive@subversion.apache.org Received: (qmail 34083 invoked by uid 500); 6 Oct 2015 09:38:14 -0000 Mailing-List: contact dev-help@subversion.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@subversion.apache.org Received: (qmail 34073 invoked by uid 99); 6 Oct 2015 09:38:14 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2015 09:38:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1AF7F1A2054 for ; Tue, 6 Oct 2015 09:38:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.151 X-Spam-Level: *** X-Spam-Status: No, score=3.151 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_LOTSOFHASH=0.25, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=wandisco.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id T-J2RY6kS1fF for ; Tue, 6 Oct 2015 09:38:01 +0000 (UTC) Received: from mail-io0-f176.google.com (mail-io0-f176.google.com [209.85.223.176]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id D50D0439D3 for ; Tue, 6 Oct 2015 09:38:00 +0000 (UTC) Received: by iofh134 with SMTP id h134so215587274iof.0 for ; Tue, 06 Oct 2015 02:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wandisco.com; s=gapps; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Eqozpy9SBPisyrfSCRFXA+twC78BX9KRMOW0kE3wp2o=; b=ofzIHkg/Nw2F5RQzmMqQrimsQUhk7TueWymklX8SLsrDMOwr3yBzm1P8+dgNengEHZ oN0iEcIPSP4u733hnDP6UBWGVM6R4SY7TN/lcbjRjIdKzWWuKVHF3p5DzSx3XltLvMZB 9sGzEJ26FPDifmHzOtK5kZG1bk7wXt4KVjTW4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Eqozpy9SBPisyrfSCRFXA+twC78BX9KRMOW0kE3wp2o=; b=Ty35cqzFkN9bMtqPpIjK+gkdWhvcuQLHwj+U6dIJzMF7wbCYbyELX07SWciXz/1BcF 4vWmd13pMP3XvmaxQ4uQ/mRftUawnbp3rwzL/lgZiYtG67IXRXRILz5gOfxCNJESlM88 h5X0oWC48P4NO2Ehqcs6LmV5s9opCsvZNYs+au1weJ1CIZZb4ubWfZe+WXEy2Z4uZRnh FEoZrVm2vd6YwxpWfnrjkT2n1wOZnLh2w0fDxHmDJSSxmpWawyD1a2nhOiYkn8UxTj4A lPHdf5JS9FOH15s8gFICpHFyYmFh0lkKt7dBtxZC/XkQCKQcs78MzoFHpVxNk2cwcbq/ +V3Q== X-Gm-Message-State: ALoCoQk42ld5NfgB8ayi4subFzE98b7PigoDu2i5IfaIL9TTfUwYmXR7Hk2wWetgd/uVkkicbGLz MIME-Version: 1.0 X-Received: by 10.107.10.224 with SMTP id 93mr33358424iok.25.1444124280339; Tue, 06 Oct 2015 02:38:00 -0700 (PDT) Received: by 10.50.225.71 with HTTP; Tue, 6 Oct 2015 02:38:00 -0700 (PDT) In-Reply-To: References: <20150921210118.GF1955@tarsus.local2> <5602746B.9050800@apache.org> <56027A52.1090806@apache.org> Date: Tue, 6 Oct 2015 11:38:00 +0200 Message-ID: Subject: Re: No-op changes no longer dumped by 'svnadmin dump' in 1.9 From: Stefan Fuhrmann To: Evgeny Kotkov Cc: Julian Foad , =?UTF-8?Q?Branko_=C4=8Cibej?= , Johan Corveleyn , dev Content-Type: multipart/alternative; boundary=001a113f8f54be0b5905216c629e --001a113f8f54be0b5905216c629e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, Oct 4, 2015 at 4:28 PM, Evgeny Kotkov wrote: > Stefan Fuhrmann writes: > > > Right now, we are losing information. In rare cases and probably nothin= g > too > > important but still. This aspect makes me consider the current behaviou= r > a > > bug. Whether creating that situation in the first place was o.k. nor no= t > is > > a separate issue. > > To my mind, the current behavior is a regression, because: > > - svnadmin dump produces different dumps for the same repository, dependi= ng > on the version (1.8 or 1.9) > That is not a problem because we don't guarantee identical dump files between releases. And only since 1.8 (?) do we even create reproducible dump files for a given release. > - svnadmin dump and svnrdump dump in 1.9 produce different dumps for the > same repository > Not sure whether that is new to 1.9 but when it is, it needs to be fixed. > - After a dump-load sequence with svnadmin from 1.9, the resulting > repository > behaves differently from the original in operations like 'svn log' > Different "behaviour" is expected: I will most likely see different transaction IDs, different rep sharing, different pack status etc. Ideally, none if these would change the output of client- side operations, but I would settle for "no information lost". > I committed a failing test in r1706428, and I also think that we have qui= te > a problem to solve =E2=80=94 see below. > Thanks for the test case. My patch should fix that. I agree that the broader scope you are asking for new feature that will require some work. > [...] > > > I suggest that we apply the patch below. It is more efficient that the > > original behaviour but ensures that "no-op" changes will be retained. > > [...] > > > V2 of the patch also covers no-op prop changes to directories. > > Although the V2 patch seems to mitigate the problem in one particular cas= e, > it does not solve it generally. For instance, here is another scenario > where > 1.8 and 1.9 binaries produce different dumps, even with the patch applied= : > > svnmucc cp 1 bar baz put empty-file baz > > Node-copyfrom-path: bar > Text-copy-source-md5: d41d8cd98f00b204e9800998ecf8427e > Text-copy-source-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709 > -Text-content-length: 0 > -Text-content-md5: d41d8cd98f00b204e9800998ecf8427e > -Text-content-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709 > -Content-length: 0 > What information is being lost here? That change should show up as added-with-history. It simply no longer claims to also change the file contents. You are basically asking here for a new "T" ("touched") state alongside the traditional "M". Subversion does not have that concept formally, yet, and any attempt to mimic it relies heavily on implementation details of your respective repository backend. Sadly, some of our logic in the higher layers either tried to exploit knowledge of the repo's internal workings or simply did not care to specify whether "touched" is a subset of "modified" or not. As a result, underspecified APIs leak unspecified behaviour up to the client. Introducing "T" consistently will require some work. Most of it should be moderately easy. Two areas are problematic, though: * Does a date change in the working copy constitute a "touch"? That would break many people's work flows. So, 'svn touch' would probably be an explicit sub-command. * Under what circumstances would a merge produce "T"? That leads directly to the question of what the semantics of touch would be plus a problem similar to "T" in working copies. > I examined the history of fixing problems caused by r1572363 + r1573111, > and > this is the second time when we find ourselves whack-a-moling the calling > sites > of svn_fs_props_different() and svn_fs_contents_different(). Previously, > we > had issues with misbehaving 'svn blame -g' [1]. This ended up with placi= ng > a hack in subversion/libsvn_repos/rev_hunt.c, and with rather questionabl= e > compatibility promises for svn_ra_get_file_revs2(): > > [[[ > else if (sb->include_merged_revisions > && strcmp(sb->last_path, path_rev->path)) > { > /* ### This is a HACK!!! > * Blame -g, in older clients anyways, relies on getting a > notification > * whenever the path changes - even if there was no content change. > * > * TODO: A future release should take an extra parameter and > depending > * on that either always send a text delta or only send it if there > * is a difference. */ > contents_changed =3D TRUE; > } > > ... > > * @note Prior to Subversion 1.9, this function may request delta handler= s > * from @a handler even for empty text deltas. Starting with 1.9, the > * delta handler / baton return arguments passed to @a handler will be > * #NULL unless there is an actual difference in the file contents betwee= n > * the current and the previous call. > * > * @since New in 1.5. > */ > svn_error_t * > svn_ra_get_file_revs2(svn_ra_session_t *session, > const char *path, > ... > > ]]] > > The r1572363 + r1573111 patchset implicitly changes behavior of around si= x > different callers, and two of them have already backfired with problems. > Moreover, apart from switching everything to the new API, these revisions > change the behavior of *existing* API like svn_fs_contents_changed(). We > have an erratum describing the change [2], but I doubt that API users can > properly adjust their code, if they notice this change =E2=80=94 because = even we > failed > to do so. > 'svn blame -g' is a prime example of exploiting unspecified FS API behaviour. 1.9 does fixes that aspect, while leaving 'svn blame -g' in it's "approximate - at best" state. IOW, we fixed a layering violation and had to add workarounds for old clients that relied on it. > Finally, I cannot understand the practical benefits from doing this. If > these > changes are aimed at making the checks more strict, but break existing > code, > I don't think we should be doing this. If this is an optimization, I'd > favor > correctness over it. > This is not about performance, it is about API guarantees and functional stability. So, yes that is an optimization in the sense of "making it better than before". And no, we should not go back to worse unless there is no other practical way. > I propose that, instead of patching the callers in response to problems, = we > restore every relevant part to its 1.8.x state, where they're known to be > stable > and work. It is the *callers* who are the problem and the explicit API of 1.9 simply exposes that fact. Therefore, fixing the callers is the right thing to do. To get fully behaviour, we need to decide upon "M" vs. "T", though. IMO, this is beyond the scope of this thread. > This would allow us to remove the hacks like the one associated with > fixing svn blame -g, would restore the proper behavior of svnadmin dump, > and > would also allow us not to deal with similar problems in other places, if > they > exist. > > How does this sound? > Well-intended but, in this case, terrible. -- Stefan^2. --001a113f8f54be0b5905216c629e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On S= un, Oct 4, 2015 at 4:28 PM, Evgeny Kotkov <evgeny.kotkov@visuals= vn.com> wrote:
Stefan Fuhrmann <stefan.fuhrmann@wandisco.com> writes:

> Right now, we are losing information. In rare cases and probably nothi= ng too
> important but still. This aspect makes me consider the current behavio= ur a
> bug. Whether creating that situation in the first place was o.k. nor n= ot is
> a separate issue.

To my mind, the current behavior is a regression, because:

- svnadmin dump produces different dumps for the same repository, depending=
=C2=A0 on the version (1.8 or 1.9)

That= is not a problem because we don't guarantee identical
du= mp files between releases. And only since 1.8 (?) do we
even = create reproducible dump files for a given release.
=C2=A0
- svnadmin dump and svnrdump dump in 1.9 produce different dumps for the =C2=A0 same repository

Not sure whether= that is new to 1.9 but when it is, it needs
to be fixed.
=
=C2=A0
- After a dump-load sequence with svnadmin from 1.9, the resulting reposito= ry
=C2=A0 behaves differently from the original in operations like 'svn lo= g'

Different "behaviour" = is expected: I will most likely see
different transaction IDs= , different rep sharing, different
pack status etc.

Ideally, none if these would change the output of client-
side operations, but I would settle for "no information
lost".
=C2=A0
I committed a failing test in r1706428, and I also think that we have quite=
a problem to solve =E2=80=94 see below.

Thanks for the test case. My patch should fix that. I
agree that the br= oader scope you are asking for new
feature that will require = some work.
=C2=A0
[...]

> I suggest that we apply the patch below. It is more efficient that the=
> original behaviour but ensures that "no-op" changes will be = retained.

[...]

> V2 of the patch also covers no-op prop changes to directories.

Although the V2 patch seems to mitigate the problem in one particula= r case,
it does not solve it generally.=C2=A0 For instance, here is another scenari= o where
1.8 and 1.9 binaries produce different dumps, even with the patch applied:<= br>
=C2=A0 svnmucc cp 1 bar baz put empty-file baz

=C2=A0 Node-copyfrom-path: bar
=C2=A0 Text-copy-source-md5: d41d8cd98f00b204e9800998ecf8427e
=C2=A0 Text-copy-source-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
=C2=A0-Text-content-length: 0
=C2=A0-Text-content-md5: d41d8cd98f00b204e9800998ecf8427e
=C2=A0-Text-content-sha1: da39a3ee5e6b4b0d3255bfef95601890afd80709
=C2=A0-Content-length: 0

What informati= on is being lost here? That change should show
up as added-with-history.= It simply no longer claims to also
change the file contents.=

You are basically asking here for a new "T" (&= quot;touched") state
alongside the traditional "M". Subve= rsion does not have that
concept formally, yet, and any attem= pt to mimic it relies heavily
on implementation details of yo= ur respective repository backend.

Sadly, some of our logi= c in the higher layers either tried to exploit
knowledge of t= he repo's internal workings or simply did not care
to spe= cify whether "touched" is a subset of "modified" or not= .
As a result, underspecified APIs leak unspecified behaviour= up
to the client.

Introducing &= quot;T" consistently will require some work. Most of it
= should be moderately easy. Two areas are problematic, though:

* Does a date change in the working copy constitute a "touch"= ;?
=C2=A0 That would break many people's work flows. So, = 'svn touch'
=C2=A0 would probably be an explicit sub-= command.

* Under what circumstances would a me= rge produce "T"?
=C2=A0 That leads directly to the = question of what the semantics
=C2=A0 of touch would be plus = a problem similar to "T" in working
=C2=A0 copies.
= =C2=A0
I examined the history of fixing problems caused by r1572363 + r1573111, an= d
this is the second time when we find ourselves whack-a-moling the calling s= ites
of svn_fs_props_different() and svn_fs_contents_different().=C2=A0 Previous= ly, we
had issues with misbehaving 'svn blame -g' [1].=C2=A0 This ended up= with placing
a hack in subversion/libsvn_repos/rev_hunt.c, and with rather questionable<= br> compatibility promises for svn_ra_get_file_revs2():

[[[
=C2=A0 else if (sb->include_merged_revisions
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&& strcmp(sb->last_path= , path_rev->path))
=C2=A0 =C2=A0 {
=C2=A0 =C2=A0 =C2=A0 /* ### This is a HACK!!!
=C2=A0 =C2=A0 =C2=A0 =C2=A0* Blame -g, in older clients anyways, relies on = getting a notification
=C2=A0 =C2=A0 =C2=A0 =C2=A0* whenever the path changes - even if there was = no content change.
=C2=A0 =C2=A0 =C2=A0 =C2=A0*
=C2=A0 =C2=A0 =C2=A0 =C2=A0* TODO: A future release should take an extra pa= rameter and depending
=C2=A0 =C2=A0 =C2=A0 =C2=A0* on that either always send a text delta or onl= y send it if there
=C2=A0 =C2=A0 =C2=A0 =C2=A0* is a difference. */
=C2=A0 =C2=A0 =C2=A0 contents_changed =3D TRUE;
=C2=A0 =C2=A0 }

=C2=A0...

=C2=A0* @note Prior to Subversion 1.9, this function may request delta hand= lers
=C2=A0* from @a handler even for empty text deltas.=C2=A0 Starting with 1.9= , the
=C2=A0* delta handler / baton return arguments passed to @a handler will be=
=C2=A0* #NULL unless there is an actual difference in the file contents bet= ween
=C2=A0* the current and the previous call.
=C2=A0*
=C2=A0* @since New in 1.5.
=C2=A0*/
svn_error_t *
svn_ra_get_file_revs2(svn_ra_session_t *session,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 const char *path,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ...

]]]

The r1572363 + r1573111 patchset implicitly changes behavior of around six<= br> different callers, and two of them have already backfired with problems. Moreover, apart from switching everything to the new API, these revisions change the behavior of *existing* API like svn_fs_contents_changed().=C2=A0= We
have an erratum describing the change [2], but I doubt that API users can properly adjust their code, if they notice this change =E2=80=94 because ev= en we failed
to do so.

'svn blame -g' is a p= rime example of exploiting unspecified
FS API behaviour. 1.9 does fixes = that aspect, while leaving
'svn blame -g' in it's= "approximate - at best" state. IOW, we
fixed a lay= ering violation and had to add workarounds for
old clients th= at relied on it.
=C2=A0
Finally, I cannot understand the practical benefits from doing this.=C2=A0 = If these
changes are aimed at making the checks more strict, but break existing code= ,
I don't think we should be doing this.=C2=A0 If this is an optimization= , I'd favor
correctness over it.

This is not about = performance, it is about API guarantees
and functional stabil= ity. So, yes that is an optimization in
the sense of "ma= king it better than before". And no, we
should not go ba= ck to worse unless there is no other
practical way.
=
=C2=A0
I propose that, instead of patching the callers in response to problems, we=
restore every relevant part to its 1.8.x state, where they're known to = be stable
and work.

It is the *callers* who are the p= roblem and the explicit API
of 1.9 simply exposes that fact. Therefore, = fixing the callers
is the right thing to do. To get fully beh= aviour, we need to
decide upon "M" vs. "T"= ;, though. IMO, this is beyond the
scope of this thread.
<= /div>
=C2=A0
Thi= s would allow us to remove the hacks like the one associated with
fixing svn blame -g, would restore the proper behavior of svnadmin dump, an= d
would also allow us not to deal with similar problems in other places, if t= hey
exist.

How does this sound?

Well-intended but,= in this case, terrible.

-- Stefan^2.

=
--001a113f8f54be0b5905216c629e--