Return-Path: X-Original-To: apmail-httpd-dev-archive@www.apache.org Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6328A72A0 for ; Fri, 14 Oct 2011 18:35:14 +0000 (UTC) Received: (qmail 14342 invoked by uid 500); 14 Oct 2011 18:35:13 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 14269 invoked by uid 500); 14 Oct 2011 18:35:13 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 14261 invoked by uid 99); 14 Oct 2011 18:35:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 18:35:13 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [188.40.99.202] (HELO eru.sfritsch.de) (188.40.99.202) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Oct 2011 18:35:06 +0000 Received: from [10.1.1.6] (helo=k.localnet) by eru.sfritsch.de with esmtp (Exim 4.72) (envelope-from ) id 1REmak-0003Wv-N3 for dev@httpd.apache.org; Fri, 14 Oct 2011 20:34:46 +0200 From: Stefan Fritsch To: dev@httpd.apache.org Subject: Re: Really big regex results from ap_pregsub Date: Fri, 14 Oct 2011 20:34:45 +0200 User-Agent: KMail/1.13.7 (Linux/3.0.0-2-amd64; KDE/4.6.5; x86_64; ; ) References: <4E204F13.7000805@halfdog.net> <4E9872F0.5070506@rowe-clan.net> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Message-Id: <201110142034.45785.sf@sfritsch.de> X-Virus-Checked: Checked by ClamAV on apache.org On Friday 14 October 2011, Eric Covener wrote: > On Fri, Oct 14, 2011 at 1:35 PM, William A. Rowe Jr. >=20 > wrote: > > On 10/14/2011 7:46 AM, Jim Jagielski wrote: > >> On Oct 13, 2011, at 4:30 PM, William A. Rowe Jr. wrote: > >>> The largest string value applicable to header values, to URI's > >>> and any presentation string (to errorlog or access log etc) is > >>> MAX_STRING_LEN. The longest config line is MAX_STRING_LEN. > >>> I don't see a lot of reasons supporting something longer. > >>=20 > >> Pre-2.4 that is true, but not on trunk=85 > >=20 > > Trunk might be even simpler... an ap_pnregsub taking a max-string > > len arg? Yes, just add an alternative API in trunk that does the right thing=20 and returns apr_status_t. > >>> This was always unambiguous, NULL on error. The doxygen has > >>> *nothing* to say about the result value. > >>>=20 > >>> So... I'd suggest we fix cases that did not expect NULL and > >>> return NULL on any substitution failure. I don't even see the > >>> need for an MMN bump. > >>=20 > >> For trunk? Yes. For pre-2.4? Not so sure (due to external > >> modules)=85 but I'll go along with it. > >=20 > > I'd love to see some additional eyes on the use cases and > > proposed solutions so we can put this to bed. >=20 > In pre-2.4, it seems we could be more tolerant than 10 subs or 8K > if we're going to be returning a NULL that's never been returned > in practice. Introducing an arbitrary length limit seems pretty invasive for 2.2.x. Btw, isn't the nmatch the number of () pairs in the regex? If so, then=20 enforcing the AP_MAX_REG_MATCH limit could introduce behaviour change=20 in existing configs: Previously, ap_pregsub on with a regex with more=20 than 10 capturing () pairs would replace $1 ... $9 with the first nine=20 matches. But now, it would just return the original string. An example=20 where this could happen is if some of the capturing parentheses should=20 actually be non-capturing "(?:...)": ((word1a|word1b)(word2a|word2b)...) Also, just returning the original string does not allow to detect=20 errors. Currently ap_pregsub in 2.2.x always succeeds in that it=20 replaces $1 to $9. I am against changing this in a way that may return=20 the unchanged string. Maybe it would be more appropriate to enforce=20 AP_MAX_REG_MATCH at compile time in ap_regcomp()? Then the errors=20 would be more obvious to the user.