httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zvi Har'El">
Subject Re: PR19317: mod_proxy and URL encoding
Date Sun, 27 Apr 2003 09:13:12 GMT
On Fri, 25 Apr 2003 15:30:38 +0200, Graham Leggett wrote about "Re: PR19317: mod_proxy and
URL encoding":
> Tikka, Sami wrote:
> >I have posted a bug report the cause of which is the fact that mod_proxy
> >rewrites the URL by encoding various characters in it. I can try to fix it,
> >but I'd like to hear your opinions first if it makes any sense to try it. 
> >Is
> >there a deep underlying reason for the proxy to touch the URL passed by the
> >browser? Why not just forward it as-is?
> As far as I am aware, it is supposed to send it as is. I am not sure 
> whether proxy is doing the rewriting, or whether something in the core 
> is rewriting the url, if you have time and can look at this it would be 
> great.
> Regards,
> Graham
> -- 
> -----------------------------------------
>		"There's a moon
> 					over Bourbon Street
> 						tonight..."

I haven't seen the posted bug report. However, I am familiar with what the
proxy is doing. I thing it is important to rewrite the URL for caching
purposes: For example, you don't wish that URLs like http://host/~user/ and
http://host/%7euser/ will be cached separately. Some proxies, like squid, do
not change the URL, and I think it is a clear disadvantage. However, there is
a bug in the way the proxy does this rewrite in the case of a reverse proxy.
This bug is demonstratable both in apache 1.3.27 and 2.0.45 version, when a URL
like http://host/xxx%25yyy returns a 400 BAD REQUEST error, and a URL like
http://host/xxx%2525yyy is handled like it were the escaped form of
http://host/xxx%yyy. The problem is that the URL is unescaped twice. The first
unescaping, in apache 1.3.27, occurs in the call to ap_unescape_url in the
static function process_request_internal, in http_request.c:1185.  just before
the call, the 3 valued flag r->proxyreq is checked, with the intention that it
will not be executed for proxy requests. However, in this stage, standard proxy
requests have already been identified (in which case this flag has been set to
STD_PROXY), however reverse proxy requests are yet to be identified (they are
identified later on by matching the URL with the ProxyPass directive, and in
case of a match, the flag is set to PROXY_PASS). Thus all requests which do not
belong to a forward proxy (as identified by the "ProxyRequests on" directive)
are unescaped. Later on, the second unescaping occurs, in the function
ap_proxy_canonenc in in proxy_util.c:139. In line 181, we see the code

/* decode it if not already done */
        if (isenc != NOT_PROXY && ch == '%') {

in which the comment is correct, but the check is wrong! if isenc is
PROXY_PASS, unescaping has already been done! The correct code should be

/* decode it if not already done */
        if (isenc == STD_PROXY && ch == '%') {

since the only case were the first unescaping was done is the forward proxy.
I checked this fix and it works. I believe that a similar fix should also be
applied to httpd-2.0.45, where in line 206 of proxy_util.c the code

	if (isenc && ch == '%') {

should be replaced by 

	if (isenc == PROXYREQ_PROXY && ch == '%') {

However, I am not familiar with the new proxy code, and the fact that
r->proxyreq has now four values calls for some caution. In any case, as I
said before, the bug is demonstrated in the new proxy exactly in the same fashion
as in the old one.

Dr. Zvi Har'El     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                                 Sunday, 25 Nisan 5763, 27 April 2003, 11:26AM

View raw message