httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Fritsch ...@sfritsch.de>
Subject Re: Expression Parser: search and replace with s/PATTERN/REPLACEMENT/FLAGS
Date Mon, 05 Oct 2015 22:17:47 GMT
On Sunday 04 October 2015 12:51:13, Graham Leggett wrote:
> On 04 Oct 2015, at 12:46 PM, Rainer Jung <rainer.jung@kippdata.de> 
wrote:
> > Yes, I agree. When starting to think closer, I noticed that the
> > string mode currently only supports a syntax that is pretty
> > different from the boolean mode and is much more limited. In that
> > mode everything is a string except it is marked via %{XXX}, in
> > which case XXX is a variable name, except XXX is AAA:BBB in which
> > case it is AAA("BBB").
> > 
> > So AFAIK we don't support functions with more than one argument in
> > string mode and my naive idea of using "STRING =~
> > s/PATTERN/REPLACEMENT/FLAGS" runs into the problem, that we
> > currently don't support operators like "=~" etc. in string mode.

This is correct.


> > So I wonder whether it would be useful to allow for a more general
> > mode which would depending on operators or functions handle the
> > argument and result as strings or booleans using auto conversion
> > between them where needed. Of course in that mode verbatim
> > strings would need proper quoting (unlike pure string mode in
> > which everything by default is a verbatim string). We could then
> > even support> 
> >    BOOLEXPR ? STRINGEXPR1 : STRINGEXPR 2
> > 
> > For compatibility that generalized mode would probably need a mode
> > differentiator syntax for compatibility reasons in 2.4 but could
> > be the default mode in trunk. Something like your "%!" prefix.

This is definitely a possible approach. I am not 100% sure that we 
would want that mode to become the default, though, because it would 
always require double quoting for simple string expressions. Like

LogMessage "'Foo=%{HTTP_FOO}'"

Somehow I also think this approach would be quite a bit of work, 
especially to deal with all corner cases and ambiguities introduced by 
auto conversion.


Another possible approach would be to implement functions with 
multiple arguments in string mode first and worry about an easier 
syntax second. If I remember correctly, I once planned to have
 %{FUNCTION:'arg1','arg2'} as syntax for this. But i did not get 
around to implementing it.

Now that I think of it, maybe

    %{FUNCTION: X/arg1/arg2/arg3 }

would be another good syntax for it, where X is an (optional?) letter 
and the / separator could be chosen from a list of separators, just 
like is already possible with the m/foo/i regex syntax. Or make it

    %{FUNCTION/arg1/arg2/arg3}

If we add optional whitespace at the beginning and end, and give our 
rexec function an alias of 's', we would get something like

    %{ s/TEXT/PATTERN/REPLACEMENT/FLAGS } or
    %{ s/PATTERN/REPLACEMENT/FLAGS/TEXT/ }

which is not perfect but maybe acceptable from a readability point of 
view.

> How about a regex function?
> 
> The single argument could be “s/PATTERN/REPLACEMENT/FLAGS”.

I think this would be easy to implement. It would require the regex to 
be parsed on every execution, though, which has the disadvantages that 
it is slower and that one would get error messages only during first 
execution and not during server startup. Also, it would possibly allow 
the admin to configure expressions where the regex pattern can contain 
untrusted data, which would turn a lot of libpcre problems from local 
into remote vulnerabilities.

If everything else fails or goes nowhere, we can do this. But I would 
like to try implementing a better solution, first.

Mime
View raw message