Return-Path: Delivered-To: apmail-httpd-users-archive@www.apache.org Received: (qmail 96861 invoked from network); 23 Apr 2008 09:24:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Apr 2008 09:24:06 -0000 Received: (qmail 73226 invoked by uid 500); 23 Apr 2008 09:23:55 -0000 Delivered-To: apmail-httpd-users-archive@httpd.apache.org Received: (qmail 72881 invoked by uid 500); 23 Apr 2008 09:23:55 -0000 Mailing-List: contact users-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: users@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@httpd.apache.org Received: (qmail 72870 invoked by uid 99); 23 Apr 2008 09:23:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2008 02:23:55 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of budzynowski@gmail.com designates 72.14.252.158 as permitted sender) Received: from [72.14.252.158] (HELO po-out-1718.google.com) (72.14.252.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2008 09:23:11 +0000 Received: by po-out-1718.google.com with SMTP id b23so184268poe.4 for ; Wed, 23 Apr 2008 02:23:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; bh=jn5W5A16fiaMJAn8FaMQSuvP/cnOnqlNQATAm/fdG3E=; b=pVFnH7KudF5+yoRSPFjAkfYTN8N39FM0COoEB8D1nmQTQhe3bNaW35A2AjmhAqiKZ1EkBphDGe0NnEbd/DykM94eQZNlCa7b0upXosD3Y79MFZGMHI+TTIAxcT0SGyT01zn16sFn5yIrNLsTvTLIYMR9P6dNJjfE+WiAMSl6Ur8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type; b=l5dGCiPiIJ+FIzs7GaRbPL2KlfjCX5t0lGYwxlE31RReOi0m9C42oqLWc1r48xAhBanISFxXiooIwbxFYqeq50cho7O+FJaVHqaJqzAlnk1ltHo3oQvI5yE+QA2Lf7KjNSxjUfEuJcl1zxHLBozdNNkgInMTUz1ZEXFbMvSN4ts= Received: by 10.110.70.5 with SMTP id s5mr137150tia.6.1208942602985; Wed, 23 Apr 2008 02:23:22 -0700 (PDT) Received: by 10.110.73.5 with HTTP; Wed, 23 Apr 2008 02:23:22 -0700 (PDT) Message-ID: Date: Wed, 23 Apr 2008 19:23:22 +1000 From: "Aleksander Budzynowski" To: users@httpd.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1094_7091212.1208942602977" X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [users@httpd] mod_rewrite: PATH_INFO gets injected with each Rule ------=_Part_1094_7091212.1208942602977 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline ---------- Forwarded message ---------- From: Rich Bowen To: users@httpd.apache.org Date: Tue, 22 Apr 2008 10:02:04 -0400 Subject: Re: [users@httpd] mod_rewrite: PATH_INFO gets injected with each Rule On Apr 21, 2008, at 08:54, Aleksander Budzynowski wrote: Hi, The behaviour I'm seeing resemebles the bug described here: http://archive.apache.org/gnats/7879 Reportedly it was fixed in 2.0.30.However, testing under both 2.2.3 and 2.0.61 I get the same sort of problem. Essentially, PATH_INFO is appended to the end of the URI before each RewriteRule is processed. If more than one RewriteRule match, you can end up with redundant garbage at the end of the URI. Let's consider a rule designed to turn all underscores into hyphens (done in a per-directory context, i.e. .htaccess file): RewriteEngine On #Convert _ to - (N flag ensures that all underscores get converted) RewriteRule ^(.*)_(.*) $1-$2 [N] It seems innocent enough. But issue a request for /_f_o_o_/bar (where _f_o_o_ does not exist, placing '/bar' in PATH_INFO), and this gets rewritten to /-f-o-o-/bar/bar/bar/bar! If you request /foo/_bar (assuming foo does not exist), then each new _bar will feed an extra underscore back into the mix, creating an infinite loop - even worse. In the RewriteLog, one sees something like this before the application of each RewriteRule: add path-info postfix: /rewritebase/_f_o_o_ -> /rewritebase/_f_o_o_/bar although each time it accumulates an extra '/bar'. This doesn't look right to me. Is it a bug? Or have I missed something obvious? This does look pretty nasty. Can you try 1) testing with the latest versions, and 2) posting your RewriteLog so that we can see what process it's going through to do this? Given that that's an example from the documentation, one kind of hopes that it'll work correctly. Also, I'm trying this out myself. Is it only on PATH_INFO, or is it also on existing file names? --Rich It's only PATH_INFO, and only within .htaccess. Looking at the 2.2.8 source (mod_rewrite.c:3694), this seems to be the culprit: if (r->path_info && *r->path_info) { rewritelog((r, 3, ctx->perdir, "add path info postfix: %s -> %s%s", ctx->uri, ctx->uri, r->path_info)); ctx->uri = apr_pstrcat(r->pool, ctx->uri, r->path_info, NULL); } It looks like nowhere in the rewriting process is r->path_info modified, meaning that this happens for EVERY RewriteRule. And this becomes a problem if more than one RewriteRule matches. Back at line 3680, we have this: ctx->uri = r->filename; Before any of the RewriteRules match, this will be the URI minus PATH_INFO. But once a rule matches, the path is changed. PATH_INFO basically becomes invalid! Is PATH_INFO recalculated after a URI is run through mod_rewrite? (If so then it would make perfect sense to empty r->path_info whenever a RewriteRule matches.) If not, should it be? Maybe only in conjunction with the [PT] flag? If we can't, for whatever reason, disturb path_info, then we could add a "matched" member to rewrite_ctx, to indicate that a substitution has already been made, and not append PATH_INFO if this has occurred. I have a feeling that this is a bug which went unnoticed because people simply blamed it on the quirks of mod_rewrite. -Aleks ------=_Part_1094_7091212.1208942602977 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline
 
---------- Forwarded message ----------
From: Rich Bowen <rbowen@rcbowen.com>
To: users@httpd.apache.org
Date: Tue, 22 Apr 2008 10:02:04 -0400
Subject: Re: [users@httpd] mod_rewrite: PATH_INFO gets injected with each Rule

 
On Apr 21, 2008, at 08:54, Aleksander Budzynowski wrote:


Hi,
 
The behaviour I'm seeing resemebles the bug described here: http://archive.apache.org/gnats/7879 Reportedly it was fixed in 2.0.30. However, testing under both 2.2.3 and 2.0.61 I get the same sort of problem.
 
Essentially, PATH_INFO is appended to the end of the URI before each RewriteRule is processed. If more than one RewriteRule match, you can end up with redundant garbage at the end of the URI.
 
Let's consider a rule designed to turn all underscores into hyphens (done in a per-directory context, i.e. .htaccess file):
 
RewriteEngine On
#Convert _ to - (N flag ensures that all underscores get converted)
RewriteRule ^(.*)_(.*) $1-$2 [N]
 
It seems innocent enough. But issue a request for
 
/_f_o_o_/bar
 
(where _f_o_o_ does not exist, placing '/bar' in PATH_INFO), and this gets rewritten to /-f-o-o-/bar/bar/bar/bar!
 
If you request /foo/_bar (assuming foo does not exist), then each new _bar will feed an extra underscore back into the mix, creating an infinite loop - even worse.
 
 
In the RewriteLog, one sees something like this before the application of each RewriteRule:
 
add path-info postfix: /rewritebase/_f_o_o_ -> /rewritebase/_f_o_o_/bar
 
although each time it accumulates an extra '/bar'.
 
 
This doesn't look right to me. Is it a bug? Or have I missed something obvious?
 

This does look pretty nasty. Can you try 1) testing with the latest versions, and 2) posting your RewriteLog so that we can see what process it's going through to do this? Given that that's an example from the documentation, one kind of hopes that it'll work correctly.




Also, I'm trying this out myself. Is it only on PATH_INFO, or is it also on existing file names?


--Rich

It's only PATH_INFO, and only within .htaccess. Looking at the 2.2.8 source (mod_rewrite.c:3694), this seems to be the culprit:

        if (r->path_info && *r->path_info) {
            rewritelog((r, 3, ctx->perdir, "add path info postfix: %s -> %s%s",
                        ctx->uri, ctx->uri, r->path_info));
            ctx->uri = apr_pstrcat(r->pool, ctx->uri, r->path_info, NULL);
        }
 
It looks like nowhere in the rewriting process is r->path_info modified, meaning that this happens for EVERY RewriteRule. And this becomes a problem if more than one RewriteRule matches.
 
Back at line 3680, we have this:
    ctx->uri = r->filename;
 
Before any of the RewriteRules match, this will be the URI minus PATH_INFO. But once a rule matches, the path is changed. PATH_INFO basically becomes invalid!

Is PATH_INFO recalculated after a URI is run through mod_rewrite? (If so then it would make perfect sense to empty r->path_info whenever a RewriteRule matches.) If not, should it be? Maybe only in conjunction with the [PT] flag?
 
If we can't, for whatever reason, disturb path_info, then we could add a "matched" member to rewrite_ctx, to indicate that a substitution has already been made, and not append PATH_INFO if this has occurred.
 
I have a feeling that this is a bug which went unnoticed because people simply blamed it on the quirks of mod_rewrite.
 
-Aleks
 
------=_Part_1094_7091212.1208942602977--