perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: utf8 urls
Date Wed, 19 Mar 2008 13:32:58 GMT

I think that these things can get very confused and confusing very 
quickly, unless one steps through them one step at a time.
Let me try a first iteration :

1) URI's, as sent to the HTTP server, should contain only US-ASCII 
characters (and no spaces).  If there are other characters, they should 
be encoded using the appropriate RFC-dictated URI-encoding scheme.
2) Whether Firefox is smart enough to automatically encode a URI 
properly, when it notices that it contains non-US-ASCII characters, is a 
nice aspect of Firefox if it does, but should not confuse the main issue.
In other words, if you send a non-ASCII URI to a server (via curl or 
lwp-request e.g.), then you should arrange yourself to URI-encode the 
3) According to a previous response, at the receiving side, when Apache 
gets a properly-encoded request URI containing non-ASCII characters, it 
leaves it encoded and passes it "as is" (or "as bytes") to the 
processing layer, which in this case is mod_perl.
4) mod_perl parses the URI and makes it accessible in several ways to 
the modules running under it (in this case a request handler or a script).
Question : does mod_perl decode the URI string prior to  passing it in 
bits and pieces to the handler/script, or not ?
(From another response, it would seem that it doesn't)
5) the handler/script obtains the URI parts from mod_perl, possibly 
through the RequestRec or Request object.
If such URI parts contained non-ASCII characters, do these modules 
perform any translation, or does the handler/script still receive them 
as URI-encoded ?
(From another response, it would seem that they don't, and it does)
6) Now the handler/script has the value of the (for instance) query 
parameter "id" (and assume it contains non-ASCII characters), and it 
wants to output it back to the browser.
To do that, it must arrange to send to the browser a HTTP header that 
will tell the browser in which character set this response is encoded, 
since by default the HTTP protocol says it is iso-8859-1.
And it seems that in order to do that, it should use, as minimum

$param = $apr->param('id');
$r->content_type('text/plain; charset="UTF-8"');
$r->print $param;

There are a couple of aspects not mentioned above, such as
- how does the handler/script "know" which decoding it should apply to 
the URI elements ? Is it certain that it is UTF-8 ?

Another go, anyone ?


Torsten Foertsch wrote:
> On Wed 19 Mar 2008, Eli Shemer wrote:
>> For some reason the following test doesn’t print anything out to the screen
>> Do I need to change something in the apache configuration, or mod_perl’s ?
>> /חוזרת
> This is probably a bug in libapreq2. I have tried this handler:
> sub {
>   my $r=$_[0];
>   $r->content_type('text/html; charset=UTF-8');
>   my $x=Apache2::Request->new($r);
>   $r->print("<html><body>\nargs=".$r->args."\nparam(x)=".      
>             $x->param('x')."\n</body></html>\n");
>   return Apache2::Const::OK;
> }
> http://localhost/test?x=חוזרת entered in FF changes on the fly into
> http://localhost/test?x=%D7%97%D7%95%D7%96%D7%A8%D7%AA and it works.
> But on the command line with curl it doesn't:
> $ curl 'http://localhost/test?x=חוזרת' -v
> * About to connect() to localhost port 80 (#0)
> *   Trying connected
> * Connected to localhost ( port 80 (#0)
>> GET /test?x=חוזרת HTTP/1.1
>> User-Agent: curl/7.16.4 (i686-suse-linux-gnu) libcurl/7.16.4 OpenSSL/0.9.8e 
> zlib/1.2.3 libidn/1.0
>> Host: localhost
>> Accept: */*
> < HTTP/1.1 200 OK
> < Date: Wed, 19 Mar 2008 12:45:29 GMT
> < Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8e DAV/2 SVN/1.4.5 
> mod_apreq2-20051231/2.6.0 mod_perl/2.0.4-dev Perl/v5.8.8
> < Transfer-Encoding: chunked
> < Content-Type: text/html; charset=UTF-8
> <
> <html><body>
> args=x=חוזרת
> param(x)=
> </body></html>
> * Connection #0 to host localhost left intact
> * Closing connection #0
> Torsten

View raw message