httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruediger Pluem <rpl...@apache.org>
Subject Re: Really big regex results from ap_pregsub
Date Sun, 16 Oct 2011 15:46:59 GMT


On 10/14/2011 08:34 PM, Stefan Fritsch wrote:
> On Friday 14 October 2011, Eric Covener wrote:
>> On Fri, Oct 14, 2011 at 1:35 PM, William A. Rowe Jr.
>>
>> <wrowe@rowe-clan.net> wrote:
>>> On 10/14/2011 7:46 AM, Jim Jagielski wrote:
>>>> On Oct 13, 2011, at 4:30 PM, William A. Rowe Jr. wrote:
>>>>> The largest string value applicable to header values, to URI's
>>>>> and any presentation string (to errorlog or access log etc) is
>>>>> MAX_STRING_LEN.  The longest config line is MAX_STRING_LEN.
>>>>> I don't see a lot of reasons supporting something longer.
>>>> Pre-2.4 that is true, but not on trunk…
>>> Trunk might be even simpler... an ap_pnregsub taking a max-string
>>> len arg?
> 
> Yes, just add an alternative API in trunk that does the right thing 
> and returns apr_status_t.
> 
>>>>> This was always unambiguous, NULL on error.  The doxygen has
>>>>> *nothing* to say about the result value.
>>>>>
>>>>> So... I'd suggest we fix cases that did not expect NULL and
>>>>> return NULL on any substitution failure.  I don't even see the
>>>>> need for an MMN bump.
>>>> For trunk? Yes. For pre-2.4? Not so sure (due to external
>>>> modules)… but I'll go along with it.
>>> I'd love to see some additional eyes on the use cases and
>>> proposed solutions so we can put this to bed.
>> In pre-2.4, it seems we could be more tolerant than 10 subs or 8K
>> if we're going to be returning a NULL that's never been returned
>> in practice.
> 
> Introducing an arbitrary length limit seems pretty invasive for 2.2.x.
> 
> 
> Btw, isn't the nmatch the number of () pairs in the regex? If so, then 
> enforcing the AP_MAX_REG_MATCH limit could introduce behaviour change 
> in existing configs: Previously, ap_pregsub on with a regex with more 
> than 10 capturing () pairs would replace $1 ... $9 with the first nine 
> matches. But now, it would just return the original string. An example 
> where this could happen is if some of the capturing parentheses should 
> actually be non-capturing "(?:...)":
> 
> ((word1a|word1b)(word2a|word2b)...)
> 
> Also, just returning the original string does not allow to detect 
> errors. Currently ap_pregsub in 2.2.x always succeeds in that it 
> replaces $1 to $9. I am against changing this in a way that may return 
> the unchanged string. Maybe it would be more appropriate to enforce 
> AP_MAX_REG_MATCH at compile time in ap_regcomp()? Then the errors 
> would be more obvious to the user.
> 
> 

+1. We should avoid non obvious changes in behaviour. It would be already bad
enough if we produce unexpected error messages or crashes, but returning no error
but just the original string looks like a bad choice especially if admins
unintentionally used () instead of (?:).

Regards

Rüdiger

Mime
View raw message