httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Fritsch ...@sfritsch.de>
Subject Re: Really big regex results from ap_pregsub
Date Fri, 14 Oct 2011 18:34:45 GMT
On Friday 14 October 2011, Eric Covener wrote:
> On Fri, Oct 14, 2011 at 1:35 PM, William A. Rowe Jr.
> 
> <wrowe@rowe-clan.net> wrote:
> > On 10/14/2011 7:46 AM, Jim Jagielski wrote:
> >> On Oct 13, 2011, at 4:30 PM, William A. Rowe Jr. wrote:
> >>> The largest string value applicable to header values, to URI's
> >>> and any presentation string (to errorlog or access log etc) is
> >>> MAX_STRING_LEN.  The longest config line is MAX_STRING_LEN.
> >>> I don't see a lot of reasons supporting something longer.
> >> 
> >> Pre-2.4 that is true, but not on trunk…
> > 
> > Trunk might be even simpler... an ap_pnregsub taking a max-string
> > len arg?

Yes, just add an alternative API in trunk that does the right thing 
and returns apr_status_t.

> >>> This was always unambiguous, NULL on error.  The doxygen has
> >>> *nothing* to say about the result value.
> >>> 
> >>> So... I'd suggest we fix cases that did not expect NULL and
> >>> return NULL on any substitution failure.  I don't even see the
> >>> need for an MMN bump.
> >> 
> >> For trunk? Yes. For pre-2.4? Not so sure (due to external
> >> modules)… but I'll go along with it.
> > 
> > I'd love to see some additional eyes on the use cases and
> > proposed solutions so we can put this to bed.
> 
> In pre-2.4, it seems we could be more tolerant than 10 subs or 8K
> if we're going to be returning a NULL that's never been returned
> in practice.

Introducing an arbitrary length limit seems pretty invasive for 2.2.x.


Btw, isn't the nmatch the number of () pairs in the regex? If so, then 
enforcing the AP_MAX_REG_MATCH limit could introduce behaviour change 
in existing configs: Previously, ap_pregsub on with a regex with more 
than 10 capturing () pairs would replace $1 ... $9 with the first nine 
matches. But now, it would just return the original string. An example 
where this could happen is if some of the capturing parentheses should 
actually be non-capturing "(?:...)":

((word1a|word1b)(word2a|word2b)...)

Also, just returning the original string does not allow to detect 
errors. Currently ap_pregsub in 2.2.x always succeeds in that it 
replaces $1 to $9. I am against changing this in a way that may return 
the unchanged string. Maybe it would be more appropriate to enforce 
AP_MAX_REG_MATCH at compile time in ap_regcomp()? Then the errors 
would be more obvious to the user.

Mime
View raw message