httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 38642] mod_rewrite adds path info postfix after a substitution occured
Date Sun, 23 Nov 2008 02:06:07 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=38642





--- Comment #10 from Aleksander Budzynowski <budzynowski@gmail.com>  2008-11-22 18:06:05
PST ---
While I agree totally with Seth in that this bug should have been fixed long
ago, we cannot simply rush a patch through. This needs to be done properly.

Because if there are any problems with the patch, who knows how long *they*
will take to be addressed?

For the purpose of helping anyone who might fix this bug, I will supply some of
the details that I provided when I reported the bug.


The problem:

If multiple RewriteRules within a .htaccess file match, unwanted copies of
PATH_INFO may accumulate at the end of the URI.

In more depth:

When you make a request for a file in a directory that doesn't exist, Apache
divides the URI into the "real" part (r->filename) and the "virtual" part
(r->path_info). This has happened by the time per-directory (.htaccess)
RewriteRules are ready to be processed, which means mod_rewrite has to put them
back together to reconstruct the original URI.

This is what happens. r->path_info is appended to ctx->uri prior to *each*
RewriteRule. If a RewriteRule (including its RewriteConds) does not match,
ctx->uri is discarded and nothing bad happens. If a RewriteRule does match,
however, the entire substitution is incorporated into r->filename.

But r->path_info is not changed! This means that subsequent rules will get an
extra copy of PATH_INFO. If more rules match then this can get worse.

Note that PATH_INFO *should* be appended before the first matching RewriteRule,
because it forms part of the URI. However, afterwards it should not be appended
again.


Example:

This comes from a .htaccess file placed in DocumentRoot. It is supposed to
replace all underscores in a URI with hyphens.

RewriteEngine On
RewriteBase /
RewriteRule ^(.*)_(.*)$ $1-$2 [N]

Make a request for "/_f_o_o_" and it will be correctly rewritten to "/-f-o-o-".
(That's because PATH_INFO is empty.)

Make a request for "/_f_o_o_/bar" and it will be rewritten to
"/-f-o-o-/bar/bar/bar/bar". (That is, unless you happen to have a _f_o_o_
directory, in which case PATH_INFO will be empty and the rewriting will work as
desired.)

Note that there are four underscores but only three erroneous copies of 
PATH_INFO - this is because the first time the rule matches, appending
PATH_INFO is correct behaviour.

Make a request for "/foo/b_ar" and an infinite loop will ensue, since every
time an underscore is replaced, a new one will be appended prior to the next
rule.

(See my bug report at https://issues.apache.org/bugzilla/show_bug.cgi?id=44922
for a RewriteLog.)


The current patch:

Looks pretty good to me. It basically uses a flag to indicate whether a
substitution has been made yet, and clears this flag when it ought to get
"reset".

I would however check perdir *before* making each calls to apr_table_xxx, as
these calls are relatively slow.

The flag is stored in r->notes - I can't see where better to put it. It can't
go in the ctx struct because the scope of that struct is a single rewrite list.
(This is bad because multiple .htaccess files can match a single request.)
Although for performance, a "cache" of the flag could be stored in the ctx
struct, and saved to r->notes in between rewrite lists.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


Mime
View raw message