httpd-apreq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Zentner <...@2bz.de>
Subject Re: Apache::Request, APR::Table and UTF8
Date Wed, 06 Oct 2004 21:14:00 GMT
Hi,

Am 06.10.2004 um 19:37 schrieb Geoffrey Young:

>
>> I think that Boris wants it to not be the user's responsibility, and 
>> I'm
>> inclined to agree with him.
>
> I'm still seeing it as the responsibility of the user here.  the user 
> best
> knows the kind of data they expect in their application.  pushing off 
> the

That is not true, Im sure you did not feel a need since your data is ok 
only by luck. utf8 and usascii is equal on the first 128 chars. A 
application can not know if the data in the table of a inherited object 
is utf8 or not they mean different things. A user has not even to know 
if the data store is inherited by APR::Table or not. It may work as 
glue to apache, but anything that is used by a usermodule can not rely 
on this.

Perl do the conversion depending on utf8 flag so the loose of the flag 
force perl to do the conversion again.
Perl do the conversion to utf8 automatic, so all results on utf8 data 
from the table and real utf8 data give wrong results.

use Encode;
$\ = $/;
$x = chr(0x2624); # make x utf8 string with 1 char.
print "\$x length: ", length $x;
$y = $x; # copy x
Encode::_utf8_off($y); # simulate store get into APR::Table
print "\$y length: ", length $y;
$z = $x . $y;
print "\$z length: ", length $z;
Encode::_utf8_off($z); # simulate store get into APR::Table
print "\$z length: ", length $z;
__OUTPUT__
$x length: 1
$y length: 3
$z length: 4
$z length: 9

The correct result is:
$x length: 1
$y length: 1
$z length: 2
$z length: 2

But thats not all, nothing work correct. substr index chop s/// m// 
rindex length ...

> responsibility of interpreting application data (charsets, 
> taintedness, etc)

Thats not my intention. The charsets are different from taintedness and 
utf8. Lets separate the charsets from the flags. My interest is only in 
the utf8 flag, but the taintness flag has the same issue. I do _not_ 
want any interpretation of any flag, I just want it back if I put it in 
*no* data change at all.

The flags belongs to the values. So if the values are moved somehow or 
copied the values must have there flags.

# $t is a table object.
@values = values %$t;
$result = dosomething(\@values);

> to APR::Table or libapreq just seems like a huge mistake to me when 
> you look
> at these libraries as bridges to an interface that is clearly non-perl.
>

It is up to you to make a decision on the flags. For me, is clear, 
currently I need to copy the data a second time to overcome the loose 
of the flags. This is bad and slower as a CGI object from the 
beginning. This may force me longterm to remove my requirement of 
libapreq2 and mod_perl toward CGI and mod_cgi or another persistent 
solution.

> anyway, I don't mean to rant on, and I'll admit that I'm far from an 
> expert
> in dealing with unicode nuances.  but I see some kind of scope creep 
> going
> on here that has me a bit, well, concerned for the future of what used 
> to be
> a nice, simple interface.
>
--
Boris


Mime
View raw message