perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob French" <drfre...@gmail.com>
Subject Re: [mp1] Can't get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl
Date Wed, 19 Mar 2008 18:41:37 GMT
I have tried setting it via Apache SetEnv directive as well as in my
environment as root when starting Apache. In both cases the variable
is correctly set in mod_perl it is just ignored.

As another test I tried the same code as a plain ol' CGI script and it
works in that case. So the issue is definitely with mod_perl and its
interaction with the PERL_UNICODE env variable.

Thanks for your help investigating. I was worried that it might be a
mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
my setup :)

Rgrds,
Rob

On Wed, Mar 19, 2008 at 11:35 AM, André Warnier <aw@ice-sa.com> wrote:
> Hi.
>
>  I cannot really think of a reason why Perl itself would do something
>  different in either case.  And in your tests, it was verified that
>  PERL_UNICODE itself is still set right under mod_perl.  So it must be
>  that mod_perl somehow overrides the basic Perl setting.  Maybe mod_perl
>  needs to do something re the filehandles, because some of them might be
>  connected to Apache ?
>
>  Anyhow, out of my depth now, so let's call on a real mod_perl guru if
>  any of them is around ?
>
>  By the way :
>  I have tried the same thing in the meantime under Apache 2.x/mod_perl
>  2.x, and I seem to have the same problem.
>
>  I have one more question : where exactly do you set PERL_UNICODE ?
>
>
>
>
>
>  Rob French wrote:
>  > Hi André,
>  >
>  > Yes, I tried that as well and it worked as expected (UTF-8 flag is
>  > set). Explicit PerlIO layer decoding works in both the non-mod_perl
>  > and mod_perl tests. It seems only the default PERL_UNICODE setting is
>  > ignored in mod_perl even though it is set.
>  >
>  > Rgrds,
>  > Rob
>  >
>  > On Wed, Mar 19, 2008 at 3:01 AM, André Warnier <aw@ice-sa.com> wrote:
>  >> Hi.
>  >>
>  >>  Perl's handling of Unicode (and of character sets in general) is
>  >>  extremely clever and powerful.
>  >>  But it can sometimes be a bit counter-intuitive.
>  >>
>  >>  In any case, it seems to me that the evaluation of the PERL_UNICODE
>  >>  environment variable is a "Perl thing" rather than a "mod_perl thing",
>  >>  and that mod_perl per se should not interfere with it.  But maybe
>  >>  mod_perl does some magic on filehandles in general which interferes, who
>  >>  knows ?
>  >>
>  >>  Maybe the first thing to do is to ascertain that the problem is really
>  >>  due to a mishandling of the PERL_UNICODE environment variable, or
>  >>  something else.  I propose a simple test :
>  >>  Instead of relying on the PERL_UNICODE variable, what happens when you
>  >>  change the open() statement as follows :
>  >>
>  >>   > open(FH, '<:utf8',"/tmp/utf8.txt");
>  >>
>  >>  thus explicitly setting a UTF-8 decoding layer for the stream FH,
>  >>  instead of relying on PERL_UNICODE.
>  >>  Does your follow-up test then indicate that the utf8 flag for $var is  set
?
>  >>
>  >>  Note : even with the decoding layer set, that does not necessarily mean
>  >>  that all data you read will end up with the utf8 flag set.  It depends
>  >>  on the data.  But in your case, if you are really using the same file
>  >>  data in both tests you show below, then it seems a valid test.
>  >>
>  >>  André
>  >>
>  >>
>  >>
>  >>
>  >>  Rob French wrote:
>  >>  > I have recently started converting one of our webapps to make it fully
>  >>  > UTF-8 compliant. All input/output from the webapp will be encoded as
>  >>  > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
>  >>  > enable UTF-8 flagging on all input/output streams. This works with
>  >>  > standalone Perl scripts like the one below (the /tmp/utf8.txt file
>  >>  > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
>  >>  >
>  >>  > #!/usr/bin/perl -w
>  >>  >
>  >>  > use strict;
>  >>  > use Encode;
>  >>  >
>  >>  > print "PERL_UNICODE Value: ${^UNICODE}\n";
>  >>  > open(FH, "</tmp/utf8.txt");
>  >>  > undef $/;
>  >>  > my $var = <FH>;
>  >>  > close(FH);
>  >>  >
>  >>  > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
>  >>  > exit;
>  >>  >
>  >>  > The resulting output after setting my PERL_UNICODE env var to SDA is:
>  >>  >
>  >>  > PERL_UNICODE Value: 63
>  >>  > Flagged as UTF8? 1
>  >>  >
>  >>  > Which is correct. Perl processed the input stream (open) as UTF-8 and
>  >>  > flagged it accordingly.
>  >>  >
>  >>  > Unfortunately if I put the exact same open call in my mod_perl
>  >>  > TransHandler $var is not flagged as UTF-8. The resulting output when
>  >>  > run in the TransHandler is:
>  >>  >
>  >>  > PERL_UNICODE Value: 63
>  >>  > Flagged as UTF8?
>  >>  >
>  >>  > The input stream is not processed as UTF-8 and not flagged internally
>  >>  > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
>  >>  > then everything works as expected. It appears as if mod_perl is
>  >>  > ignoring the PERL_UNICODE env variable and not processing my input
>  >>  > streams as UTF-8.
>  >>  >
>  >>  > Thanks in advance.
>  >>  >
>  >>  > Cheers
>  >>  >
>  >>  >
>  >>  >
>  >>  >
>  >>  > Environment details below:
>  >>  >
>  >>  > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>  >>  >   Platform:
>  >>  >     osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
>  >>  > archname=i386-linux-thread-multi
>  >>  >     uname='linux hs20-bc1-4.build.redhat.com
>  >>  > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
>  >>  > i686 i386 gnulinux '
>  >>  >     config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
>  >>  > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
>  >>  > -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
>  >>  > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
>  >>  > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
>  >>  > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
>  >>  > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
>  >>  > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
>  >>  > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
>  >>  > 5.8.0'
>  >>  >     hint=recommended, useposix=true, d_sigaction=define
>  >>  >     usethreads=define use5005threads=undef useithreads=define
>  >>  > usemultiplicity=define
>  >>  >     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>  >>  >     use64bitint=undef use64bitall=undef uselongdouble=undef
>  >>  >     usemymalloc=n, bincompat5005=undef
>  >>  >   Compiler:
>  >>  >     cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>  >>  > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
>  >>  > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>  >>  >     optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>  >>  >     cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>  >>  > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>  >>  >     ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>  >>  >     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>  >>  >     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>  >>  >     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
>  >>  > lseeksize=8
>  >>  >     alignbytes=4, prototype=define
>  >>  >   Linker and Libraries:
>  >>  >     ld='gcc', ldflags =' -L/usr/local/lib'
>  >>  >     libpth=/usr/local/lib /lib /usr/lib
>  >>  >     libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread
-lc
>  >>  >     perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>  >>  >     libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>  >>  >     gnulibc_version='2.3.4'
>  >>  >   Dynamic Linking:
>  >>  >     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
>  >>  > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE'
>  >>  >     cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>  >>  >
>  >>  >
>  >>  > Characteristics of this binary (from libperl):
>  >>  >   Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
>  >>  > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>  >>  >   Built under linux
>  >>  >   Compiled at Jul 24 2006 18:28:10
>  >>  >   @INC:
>  >>  >     /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/5.8.5
>  >>  >     /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/site_perl/5.8.5
>  >>  >     /usr/lib/perl5/site_perl/5.8.4
>  >>  >     /usr/lib/perl5/site_perl/5.8.3
>  >>  >     /usr/lib/perl5/site_perl/5.8.2
>  >>  >     /usr/lib/perl5/site_perl/5.8.1
>  >>  >     /usr/lib/perl5/site_perl/5.8.0
>  >>  >     /usr/lib/perl5/site_perl
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.5
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.4
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.3
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.2
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.1
>  >>  >     /usr/lib/perl5/vendor_perl/5.8.0
>  >>  >     /usr/lib/perl5/vendor_perl
>  >>  >     .
>  >>  > mod_perl version: 1.30
>  >>  >
>  >>
>  >
>
>

Mime
View raw message