httpd-apreq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Schaefer <joe+gm...@sunstarsys.com>
Subject Re: Apache::Request, APR::Table and UTF8
Date Tue, 05 Oct 2004 16:20:46 GMT
Boris Zentner <bzm@2bz.de> writes:

[...]

> In my case I get data via POST requests, where the data is partly in utf8
> _sometimes_. So My initiate idea was to scan for that data and set the utf8
> flag where needed.
> 
> my $t = $apr->params;
> for( values %$t ) {
>    if ( $something ) {
>      Encode::_utf8_on($_);
>    } elsif ( $something_else ) {
>      $_ = Encode::decode( 'iso-8859-1', $_ );
>    }
>    # else data is already in good shape.
> }
> 
> It is important that the flag is now correctly delivered since the
> modules that get the data are out of my control. Any other solution is
> highly error-prone. And wrong since length, ord, chr, substr, index,
> chop, regex and more work different now.
> 
> Perl > 5.6.0 do the utf8 conversion automatic so there is no other
> option that supporting this flag. So that $t->get/set and $t->do( )
> work. 
> 
> It is just one bit that needs to be reserved for any value in the table.

Any opinions out there?  As I mentioned to Boris on dev@perl, we used to 
have a charset attribute for apreq_value_t.  I think it might be
worthwhile to resurrect it in some limited form.  That way we have room 
for improvement within apreq2, instead of delaying charset support until 
apreq3.

We could use three-bit field for marking the charset:

  0 - unknown
  1 - ASCII
  2 - UTF-8
  3 - UTF-16
  [ room for 4 more iso? charsets ]


and the current code could just set the charset to 0 - unknown.
That would leave a little room (at least 5 bits) for future expansion 
within the apreq_value_t.  We could use that room for other features,
like marking SvTAINT on a per-value level instead of per-table.

Thoughts?
-- 
Joe Schaefer


Mime
View raw message