Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 9632 invoked by uid 500); 11 Apr 2001 03:45:01 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 9621 invoked from network); 11 Apr 2001 03:45:01 -0000 Sender: dave@dcc.vu Message-ID: <3AD3D4D2.D4373278@convio.com> Date: Tue, 10 Apr 2001 22:51:46 -0500 From: David Crooke Reply-To: dave@convio.com Organization: Convio X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.5-15 i686) X-Accept-Language: en MIME-Version: 1.0 To: new-httpd@apache.org Subject: [PATCH] mod_alias RedirectMatch glitch Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N I came across a problem with Apache the other day, a very tiny thing but an issue which seems to have been lurking for a couple of years - there are a number of open GNATS entries I found for it, with no feedback against them. I have done a bit of digging and convinced myself it's a bug, or at least it would work better the way I expected it to rather than the way it does now, and found the (trivial) code patch to change it. The details........... Here's a sample of open GNATS issues - the latter I submitted myself and included the patch below. http://bugs.apache.org/index.cgi/full/3333 http://bugs.apache.org/index.cgi/full/4379 http://bugs.apache.org/index.cgi/full/5948 http://bugs.apache.org/index.cgi/full/7503 If the target URI of a RedirectMatch contains a querystring or an anchor it will not work as intended by the user, because mod_alias url-encodes the URI before passing it back to the browser, e.g. RedirectMatch 301 /foo.html http://www.wibble.org/cgi-bin/bar.cgi?page=foo will return the following URL to the browser as the redirect target: http://www.wibble.org/cgi-bin/bar.cgi%3fpage=foo which of course results in a 404 on the subsequent request. I suspect this may be by design, and that the reason for doing it this way is because the target of the RedirectMatch is intended to replace only the base part of the URI without the querystring segment; if the original URI contains a querystring this is appended (correctly, without being re-encoded) to the result of encoding the 3rd argument to RedirectMatch. However, with Redirect instead of RedirectMatch, the above statement behaves as expected and intended by a typical user. My observations (IMHO only): 1. The behaviour of Redirect and RedirectMatch is definitely inconsistent; both replace only the base part of the URI, but the latter forcibly escapes the replacement string and the former does not. 2. My feeling is that he use case of wanting to redirect from a "base" URI with no querystring (especially the case of redirecting "/" but not the whole URI namesapce, which requires RedirectMatch) to a URI including a querystring is more common than wanting to redirect from one base URI to another while preserving the incoming querystring, and much more common than the decidedly questionable idea of wanting to have a target base URI which contains escaped special characters. 3. If RedirectMatch was changed to work like Redirect, and anyone was to want the latter option in 2 above, they could easily write the urlencoded format in httpd.conf to start with, whereas the current behaviour totally blocks the more common case 4. In short, I cannot see a reason to urlencode, but I may be missing something in the architecture of the module - I have not looked into how it works with declarations in .htaccess files, for instance. 5. For these reasons, I believe that on balance taking the urlencode call out makes more sense than leaving it in. 6. The correct and complete fix would be to change RedirectMatch so that the target is not automagically url-encoded (as per the code change below), and to also add an optional fourth argument of flags (I would suggest syntax in the style of the "[PT]" etc. options to mod_rewrite) with one which would mean "take the third argument as the entire URL and discard the inbound querystring, if any, which came in with the previous URL" 6. Arguably, for downward compatibility there should be a second flag for "don't urlencode", while leaving the current behaviour as the default even if it is considered erroneous. I have not investigated how to implement the flag option, but if the Apache project people agree in principle with the idea, I will sign up to produce a scrupulously documented patch which implements it in a form ready to apply to the main source trees (1.3 and 2.0). The change to turn off the urlencoding only, is as follows against the 1.3.19 version of src/modules/standard/mod_alias.c (also works with versions back as far as 1.3.12) *** mod_alias.c Mon Jan 15 11:05:35 2001 -- mod_alias.c.new Sat Mar 31 17:25:46 2001 *************** *** 308,320 **** --- 308,324 ---- if (p->regexp) { if (!ap_regexec(p->regexp, r->uri, p->regexp->re_nsub + 1, regm, 0)) { if (p->real) { found = ap_pregsub(r->pool, p->real, r->uri, p->regexp->re_nsub + 1, regm); + /* + Removed to make RedirectMatch work like Redirect - DCC + if (found && doesc) { found = ap_escape_uri(r->pool, found); } + */ } else { /* need something non-null */ found = ap_pstrdup(r->pool, ""); } Look forward to hearing your thoughts Cheers Dave -- David Crooke, Chief Technology Officer Convio Inc. - the online partner for nonprofits 4801 Plaza on the Lake, Suite 1500, Austin TX 78746 Tel: (512) 652 2600 - Fax: (512) 652 2699