stdcxx-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Sebor <se...@roguewave.com>
Subject Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp
Date Tue, 19 Feb 2008 20:00:45 GMT
Travis Vitek wrote:
> 
> Martin,
> 
> Thank you for the additional testcases. They point out a few issues that I
> didn't interpret from the description in the Bash Reference Manual
> [http://www.gnu.org/software/bash/manual/bashref.html#Brace-Expansion]. Note
> that below I refer to paragraphs from this documentation.

In case it wasn't clear from my comments, most of the new test cases
(all those in run_bash_tests()), including the expected output, came
from the Bash test suite.

> 
> I do have a few issues with the expectations you've laid out. Comments
> follow...
> 
> 
> sebor-2 wrote:
>> +
>> +    TEST ("foo {1,2} bar", "foo 1 2 bar");
>> +
>>
> 
> This isn't a brace expansion. It is a literal string, followed by a brace
> expansion, followed by a literal string. When you run 'echo foo {1,2} bar'
> in the shell, each of the args are brace expanded individually, so the only
> thing that is brace expanded is the '{1,2}' and everything else is written
> literally. I believe this testcase is invalid.

So rw_brace_expand() only handles a single brace expression? I.e.,
a pattern like this:

  pattern  ::= [ <preamble> ] '{' <list> | <seq-exp> '}' [ <postfix>
]
  list     ::= string [ , <list> ] | string
  seq-expr ::= <char> .. <char> | <number> .. <number>

If that's so, what's the definition of <preamble> and <postfix>?

FWIW, the grammar I suggested here http://tinyurl.com/2rs3he allows
multiple brace expressions:

  string     ::= <brace-expr> | [ <chars> ]
  brace-expr ::= <string> '{' <brace-list> '}' <string> | <string>
  brace-list ::= <string> ',' <brace-list> | <string>
  chars      ::= <pcs-char> <string> | <pcs-char>
  pcs-char   ::= character in the Portable Character Set

We don't need to follow it but if we choose not to we should document
what grammar we do follow.

> 
> 
> sebor-2 wrote:
>> +    // we don't have eval
>> +    // TEST ("`zecho foo {1,2} bar`",  "foo 1 2 bar");
>> +    // TEST ("$(zecho foo {1,2} bar)", "foo 1 2 bar");
>>
> 
> Same problem here.

I left these in place just for completeness but I don't expect us to
ever implement eval. Otherwise, I agree they're the same as the test
case above.

> 
> 
> sebor-2 wrote:
>> +#if 0   // not implemented yet
>> +
>> +    // set the three variables
>> +    rw_putenv ("var=baz:varx=vx:vary=vy");
>> +
>> +    TEST ("foo{bar,${var}.}", "foobar foobaz.");
>> +    TEST ("foo{bar,${var}}",  "foobar foobaz");
>> +
>> +    TEST ("${var}\"{x,y}",    "bazx bazy");
>> +    TEST ("$var{x,y}",        "vx vy");
>> +    TEST ("${var}{x,y}",      "bazx bazy");
>> +
>> +    // unset all three variables
>> +    rw_putenv ("var=:varx=:vary=");
>> +
>> +#endif   // 0
>>
> 
> I don't expect this functionality to ever be implemented inside
> rw_brace_expand(). As mentioned in paragraph 4, the brace expansion itself
> is done before other expansions, and it does not interpret the text between
> the braces.
> 
> Given this, I feel that the environment variable expansion must done at some
> later stage, by some other function, and the above test block is
> inappropriate for this test.

Okay. Again, I included them for completeness (I didn't want to just
completely remove some test cases), but if it makes sense to do this
expansion at a later stage in some other function that calls
rw_brace_expand() that's fine with me

> 
> 
> sebor-2 wrote:
>> +
>> +    TEST ("{1..10}", "1 2 3 4 5 6 7 8 9 10");
>> +
>>
> 
> This is a case that I should be handling. I need to go back and add complete
> support for integer ranges, specifically ranges that include multidigit
> numbers and sign.
> 
> 
> sebor-2 wrote:
>> +    // this doesn't work in Bash 3.2
>> +    // TEST ("{0..10,braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
>> +
>>
> 
> I don't know how anyone could expect this to work. The first subexpression
> of the brace expansion list is '0..10', which itself is not a brace
> expansion, so it should not be expanded. It should be left as a literal.
> This happens to be the behavior I see with Bash 3.0.

Yes, but from the comment above the test case in the braces.test file
it looks like they plan to make it work. The comments says: # this
doesn't work yet. I don't think it's an important use case and I have
no problem with not implementing it (for now).

> 
> 
> sebor-2 wrote:
>> +    // but this does
>> +    TEST ("{{0..10},braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
>> +    TEST ("x{{0..10},braces}y",
>> +          "x0y x1y x2y x3y x4y x5y x6y x7y x8y x9y x10y xbracesy");
>> +
>>
> 
> Obviously, both of these are valid versions of the previous test expression.
> 
> 
> sebor-2 wrote:
>> +    TEST ("{a..A}",
>> +          "a ` _ ^ ]  [ Z Y X W V U T S R Q P O N M L K J I H G F E D C B
>> A");
>> +    TEST ("{A..a}",
>> +          "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [  ] ^ _ `
>> a");
>> +
>>
> 
> Interesting. I didn't think it would make sense to allow mixing of lower and
> uppercase characters in the sequence expression because of the characters
> between 'Z' and 'a'. Obviously I was wrong. BTW, any idea what happened to
> ASCII 92? It is the backslash character that should appear between '[' and
> ']'.

No idea. To me this is one of the most dubious of the features since
it assumes ASCII. I wouldn't be at all upset if we didn't implement
it (at least not until we actually need it for something ;-)

> 
> 
> sebor-2 wrote:
>> +
>> +    TEST ("0{1..9} {10..20}",
>> +          "01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20");
>>
> 
> This has the same problem as the first issue I brought up. This is actually
> two seperate brace expansions, the first is '0{1..9}' and the second is
> '{10..20}'. This is how the shell handles them, and this is how I handle
> them.
> 
> If they were treated as one brace expansion by the shell, I would expect the
> postscript '{10..20}' expanded for each prefix/body expansion, much like you
> would see if you escaped the space.

I realize those are two brace expressions but I don't understand why
the function shouldn't be able to handle them as such. That's what
the expected output assumes, isn't it? I expect this to come up in
our uses of the feature so if rw_brace_expand() doesn't implement
it we'll have to implement it somewhere else. Do you see a problem
with implementing it in rw_brace_expand()?

> 
> 
> sebor-2 wrote:
>> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
>> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
>>
> 
> I don't understand how this could be interpreted as valid brace expansion at
> all. The body of the expansion is '{b{d,e}}'. Paragraph 5 [and paragraph 1
> for that matter] require a correctly-formed brace expansion have unquoted
> [unescaped?] opening and closing braces, and at least one unquoted comma or
> a valid sequence expression. The body does not meet either of these
> requirements, so it must be invalid.
> 
> To get the result shown, the obvious thing to do is to escape the outer
> braces. This would give us the valid expression 'a-\{b{d,e}\}-c', that
> happens to also work with previous versions of bash also.
> 
> 
> sebor-2 wrote:
>> +    TEST ("a-{bdef-{g,i}-c", "a-{bdef-g-c a-{bdef-i-c");
>>
> 
> Again, this does not seem correct according to the requirements of paragraph
> 5 [and 1].

Obviously, these last two are corner cases (as the comment above
them indicates). I don't think we'll ever try to do anything as
bizarre as this. What is interesting about them is the fact that
the Bash maintainers cared enough about them to include them in
their (otherwise pretty small) test suite.

> 
> If the body is supposed to be between a pair of braces, shouldn't the first
> unescaped opening brace match the first unescaped close brace at the same
> brace depth? If it is, then the outer brace expansion isn't valid because it
> doesn't have a terminating close brace. Even if one was added, the resulting
> expression has the same problem as the previous example. The nested
> expression 'bdef-{g,i}-c' isn't a series comma-seperated strings or a
> sequence expression. 
> 
> If you wanted the first brace to be ignored, as it is in the test, then it
> should be escaped. Then we would have 'a-\{bdef-{g,i}-c'. That expression
> follows the requirements outlined in the manual, and works with old versions
> of bash, and a human can pretty easily figure out what the expected result
> would be.
> 
> Now I suppose that since invalid brace expansions are to be left unchanged,
> you could say that the first brace expansion is copied literally because it
> is invalid, but the second is valid and should be expanded. This almost
> explains how bash 3.2 gets these results, but it still seems wrong. If a
> subexpression is invalid it seems that the whole expression is invalid.
> 
> 
> sebor-2 wrote:
>> +    TEST ("{",     "{");
>> +    TEST ("}",     "}");
>> +    TEST ("{}",    "{}");
>> +    TEST ("{ }",   "{ }");
>> +    TEST ("{  }",  "{  }");   // is this right?
>>
> 
> I sure think it is. Again, the requirements say that these are not valid
> brace expansions, so they should be left unchanged. I'm wondering if the
> shell is doing some sort of whitespace collapse. Everything seems to work
> fine if you escape the spaces, so I'm thinking that is why you see the
> behavior that you do.

Yes, I believe that's exactly what it does, and I wondered if
rw_brace_expand() should do the same thing. I didn't spend too much
time contemplating whether it makes sense at this level or if it's
just a consequence of the shell tokenization.

> 
> So, with all that said, I've got a few thoughts.
> 
> 1. I don't really like the idea of trying to emulate all behavior of the
> shell in rw_brace_expand. If we want that, then we should have made a bug
> entitled 'provide a complete implementation of bash'.
> 2. I don't feel comfortable trying to maintain compatibility with version
> 3.2 of bash. It doesn't seem to follow the documented requirements, and I
> believe that the odd behavior may be difficult to implement. The bash 3.0
> implementation seems much more sane and that is what I tried to emulate when
> writing this code.
> 3. If you, er, we want to do brace expansion exactly like you see within
> bash, then we should write another function that tokenizes a string on
> whitespace and does brace expansion on each token. I was expecting the
> caller of rw_brace_expand() to expect the function to do brace expansion,
> not complete shell emulation.

Well, I thought since we were implementing a Bash feature the Bash
test cases would be useful. If we make a conscious decision to either
deviate from the Bash behavior or to not implement the features we
don't expect to use that's okay with me as long as we document what
our behavior is. One (IMO easy) way to do it is in in the test suite
in the form of test cases, with an explanation for any differences
with Bash. I tend to use the test cases for the test driver when I
need to know how something works.

Martin

Mime
View raw message