subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Shahaf <...@daniel.shahaf.name>
Subject Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py
Date Sun, 11 Nov 2018 06:29:30 GMT
Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100:
> On 10.11.2018 01:31, Daniel Shahaf wrote:
> > Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
> >> On 09.11.2018 12:54, brane@apache.org wrote:
> >>> Author: brane
> >>> Date: Fri Nov  9 11:54:21 2018
> >>> New Revision: 1846237
> >>>
> >>> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
> >>> Log:
> >>> More tests for peg revision parsing.
> >> [...]
> >>> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
> >>> +def remove_subdir_7a_no_escape_peg(sbox):
> >>> +  "remove missing 'E/@' without pegrev escape"
> >>> +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")
> >> Yes indeed ... our target parsing is so incredibly reliable that
> >> attempting to delete a non-existent 'E/@' will happily delete 'E' with
> >> all its contents. I think this may be too much of a good thing.
> > The documented escaping rule for CLI consumers is to append an "@" after
> > every filename.  By this rule, a CLI consumer that _wants_ to remove E/
> > would run "svn rm E/@".  That command should parse as { target = 'E/',
> > peg = '' } regardless of whether the directory "E" contains a child
> > named "@".
> >
> > I don't understand what change you're proposing.
> 
> Consider this:
> 
> $ svn rm E@
> D         E
> $ svn rm @
> svn: E125001: '@' is just a peg revision. Maybe try '@@' instead?
> $ svn rm E/@
> D         E
> 
> 
> would it not be more consistent that trying to remove 'E/@' would raise
> the same error as trying to remove '@'?

The argument parsing is documented to work as follows: remove the last
"@" and everything after it — that's the peg rev.  What remains is the
target dirent or URL.  Thus, ultimately the reason for the different
errors is that "E/" is a valid OS-level path specification but "" is not
(in the sense that «stat("E/")» works and «stat("")» errors).

In other words, your examples are simply usage errors, but it so happens
that one of them is a syntax error and one of them is not.  (It would
have been good if both usage errors had been syntax errors, but we're
12 years too late to get that right to begin with.  C'est la vie.)

The correct syntaxes for removing "./@" and "E/@", of course, are
«svn rm E/@@» and «svn rm @@» respectively.  

I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:

1. Because that would be backwards incompatible.

2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
is parsed in exactly the same way regardless of the values of
${subcommand} and ${target} and regardless of the tree contents.

IIRC «svn up @» used to be the same as «svn up ./», so I think we
shouldn't make it mean "update the file './@'" because that would be
a silent incompatibility for some users.  I don't remember what the last
minor line with the non-error meaning was, though.

> Because otherwise the behaviour of the client depends on whether you're
> removing things from the current directory or not.

To be more precise, the question is whether one spells the command
«svn rm @» or «svn rm ./@».

> This becomes especially error-prone in the case when
> E/@ exists:
> 
> $ svn rm E/@
> D         E
> D         E/@
> 
> 
> I'm pretty sure we should not remove a parent directory if the child
> might have been the intended target.
> 

I see that there's an argument that we should require some option here,
à la --force-log, but frankly, I don't think files literally called "@"
or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
about.

Whatever we do, we must have a documented syntax that means "recursively
delete E/ and everything thereunder" and works regardless of what
filenames E/ contains.

> These usages were not ambiguous and possibly error-prone until we
> invented the peg-revision syntax. At the time we failed to strictly
> define the semantics , so maybe it's time we did that now. For example,
> given the path
> 
>     foo@bar/baz
> 
> the parser will treat 'bar/baz' as the peg-revision specifier, even
> though it's obvious that's not what the user had in mind.
> If we changed the parser to only look for peg revisions on the basename of
> the path, then at least the only restriction we'll have is that directory
> separators can't be part of the peg revision specifier — which should be an
> acceptable restriction, since currently we only support known keywords,
> revision numbers or dates, none of which contain forward- or backslashes.
> 
> Obviously that would only be a parser change, peg revisions would still
> apply to whole paths for command semantics.
> 

So you're proposing changing the parsing rule from "the last @" to
"the last @ that isn't followed by any os.path.sep"?  That's forward
compatible if we constrain ourselves never to use slash or backslash in
a pegrev specifier, and backwards compatible in that every non-erroneous
usage will continue to work, so it's something we can consider.

How would that affect scheme://user@hostname/ URLs?  What about typos
such as «foo@HEA/D»?

> I'm aware that doing this would change the meaning of some forms of
> commands, but for now I'm fairly sure we'd only produce new errors, not
> silently behave differently. From the current set of tests, it seems to
> follow that this inconsistency only arises with targets whose basename
> starts with '@' or '.@' (and probably '..@'), the latter because '.'
> (and '..') are removed by canonicalization.

I don't follow what you mean by "this"; do you refer to forbidding
slashes in pegrevs or to something else?  (The former wouldn't seem to
address all the concerns you described.)

Cheers,

Daniel

P.S.  Other languages also have examples of usage errors that aren't
syntax errors.  For example, in C the tokens «+», «-», and «*» can be
either unary or binary operators, so «double hypotenuse(double a, double b)
{ return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
is not.

Mime
View raw message