perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl
Date Wed, 19 Mar 2008 10:01:48 GMT
Hi.

Perl's handling of Unicode (and of character sets in general) is 
extremely clever and powerful.
But it can sometimes be a bit counter-intuitive.

In any case, it seems to me that the evaluation of the PERL_UNICODE 
environment variable is a "Perl thing" rather than a "mod_perl thing", 
and that mod_perl per se should not interfere with it.  But maybe 
mod_perl does some magic on filehandles in general which interferes, who 
knows ?

Maybe the first thing to do is to ascertain that the problem is really 
due to a mishandling of the PERL_UNICODE environment variable, or 
something else.  I propose a simple test :
Instead of relying on the PERL_UNICODE variable, what happens when you 
change the open() statement as follows :

 > open(FH, '<:utf8',"/tmp/utf8.txt");

thus explicitly setting a UTF-8 decoding layer for the stream FH, 
instead of relying on PERL_UNICODE.
Does your follow-up test then indicate that the utf8 flag for $var is  set ?

Note : even with the decoding layer set, that does not necessarily mean 
that all data you read will end up with the utf8 flag set.  It depends 
on the data.  But in your case, if you are really using the same file 
data in both tests you show below, then it seems a valid test.

André


Rob French wrote:
> I have recently started converting one of our webapps to make it fully
> UTF-8 compliant. All input/output from the webapp will be encoded as
> UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
> enable UTF-8 flagging on all input/output streams. This works with
> standalone Perl scripts like the one below (the /tmp/utf8.txt file
> contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
> 
> #!/usr/bin/perl -w
> 
> use strict;
> use Encode;
> 
> print "PERL_UNICODE Value: ${^UNICODE}\n";
> open(FH, "</tmp/utf8.txt");
> undef $/;
> my $var = <FH>;
> close(FH);
> 
> print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
> exit;
> 
> The resulting output after setting my PERL_UNICODE env var to SDA is:
> 
> PERL_UNICODE Value: 63
> Flagged as UTF8? 1
> 
> Which is correct. Perl processed the input stream (open) as UTF-8 and
> flagged it accordingly.
> 
> Unfortunately if I put the exact same open call in my mod_perl
> TransHandler $var is not flagged as UTF-8. The resulting output when
> run in the TransHandler is:
> 
> PERL_UNICODE Value: 63
> Flagged as UTF8?
> 
> The input stream is not processed as UTF-8 and not flagged internally
> as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
> then everything works as expected. It appears as if mod_perl is
> ignoring the PERL_UNICODE env variable and not processing my input
> streams as UTF-8.
> 
> Thanks in advance.
> 
> Cheers
> 
> 
> 
> 
> Environment details below:
> 
> Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>   Platform:
>     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
> archname=i386-linux-thread-multi
>     uname='linux hs20-bc1-4.build.redhat.com
> 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> i686 i386 gnulinux '
>     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
> -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
> -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
> -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
> -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
> -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
> -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
> -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
> -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
> 5.8.0'
>     hint=recommended, useposix=true, d_sigaction=define
>     usethreads=define use5005threads=undef useithreads=define
> usemultiplicity=define
>     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>     use64bitint=undef use64bitall=undef uselongdouble=undef
>     usemymalloc=n, bincompat5005=undef
>   Compiler:
>     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
>     alignbytes=4, prototype=define
>   Linker and Libraries:
>     ld='gcc', ldflags =' -L/usr/local/lib'
>     libpth=/usr/local/lib /lib /usr/lib
>     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
>     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>     gnulibc_version='2.3.4'
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
> -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
> 
> 
> Characteristics of this binary (from libperl):
>   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>   Built under linux
>   Compiled at Jul 24 2006 18:28:10
>   @INC:
>     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/5.8.5
>     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>     /usr/lib/perl5/site_perl/5.8.5
>     /usr/lib/perl5/site_perl/5.8.4
>     /usr/lib/perl5/site_perl/5.8.3
>     /usr/lib/perl5/site_perl/5.8.2
>     /usr/lib/perl5/site_perl/5.8.1
>     /usr/lib/perl5/site_perl/5.8.0
>     /usr/lib/perl5/site_perl
>     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>     /usr/lib/perl5/vendor_perl/5.8.5
>     /usr/lib/perl5/vendor_perl/5.8.4
>     /usr/lib/perl5/vendor_perl/5.8.3
>     /usr/lib/perl5/vendor_perl/5.8.2
>     /usr/lib/perl5/vendor_perl/5.8.1
>     /usr/lib/perl5/vendor_perl/5.8.0
>     /usr/lib/perl5/vendor_perl
>     .
> mod_perl version: 1.30
> 

Mime
View raw message