perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Nokes <>
Subject Re: UTF-8 encoding problems under Apache 2 with mod_perl 2.
Date Thu, 05 Apr 2007 03:33:38 GMT
We completely separate out our content, from our presentation templates, from our source.
 We use HTML::Mason mostly as a layer of abstraction to mod_perl's raw API, and then use HTML::Template
to munge our content with our templates in pre-release batch mode and/or dynamically.

We keep all of our strings in versioned content files, in XML format, something like the following:

    <str id="landHelp.004">
        <content>Here's how to do it:</content>
    <str id="commChat.217">
        <content>How to participate</content>

This is an example of a US English XML string file.  All of the different locales we support
have their own string files, with the base being the US English one, meaning we always translate
enUS -> other locale.  So, for the traditional Chinese XML file, we would have the equivalent
strings for those example stringIDs in-between the <content></content> tags, but
in Chinese, the same for Polish, etc.

Then, a template might look like:

            <option value="-1">------------------</option>
            <option value="-2"><!-- TMPL_VAR NAME=landHelp.004 --></option>

... etc.  This is just HTML::Template syntax inside of a standard HTML template.  The post-munging
phase would be:

            <option value="-1">------------------</option>
            <option value="-2">Here's how to do it:</option>

So, whenever we read in our string files for munging with templates, we tell Perl that the
file is UTF-8 formatted, by creating the file handle as such, and that's it really; internally
Perl automatically treats that string content as UTF-8 unless we state otherwise explicitly.
 We use the Encode module all the time to convert between UTF-8 and Big-5, or ISO-8859-2,
or whatever for email templates with the same content.  Of course in your web page templates,
you have to have your character encoding set properly as well as in your emails for this all
to work with respective clients.  Apache really doesn't care, it just sees 8-bit data and
serves it to the client.

I also have the following set in the ENVironment of the user running apache, but I've completely
commented it out and I see no difference in behavior, but I keep in there for posterity I
guess ... :-)

    # Perl Unicode Support
    # This ENV will force the entire Perl interpreter in Apache to have the
    # following IO layers/streams forced to use UTF-8 as the desired charset.
    # See `perldoc perlrun` and `perldoc peruniintro` for more details.
    # I     1    STDIN is assumed to be in UTF-8
    # O     2    STDOUT will be in UTF-8
    # E     4    STDERR will be in UTF-8
    # S     7    I + O + E
    # i     8    UTF-8 is the default PerlIO layer for input streams
    # o    16    UTF-8 is the default PerlIO layer for output streams
    # D    24    i + o
    # A    32    the @ARGV elements are expected to be strings encoded in UTF-8
    # L    64    normally the "IOEioA" are unconditional,
    #            the L makes them conditional on the locale environment
    #            variables (the LC_ALL, LC_TYPE, and LANG, in the order
    #            of decreasing precedence) -- if the variables indicate
    #            UTF-8, then the selected "IOEioA" are in effect
      export PERL_UNICODE

It's important to understand how Perl deals with character data internally, and how it uses
the UTF-8 flag it sets, etc.  You should probably read up on it if you haven't at the following

Hope this helps you out.
- Jeff

----- Original Message ----
From: Jeff Pang <>
Sent: Wednesday, April 4, 2007 7:20:01 PM
Subject: Re: UTF-8 encoding problems under Apache 2 with mod_perl 2.

>We also do everything (not source code, which is in ISO-8859-1, only content) in UTF-8
where I >work, and we support many different languages.  

Jeff,how did you do it by using utf-8 for everything?can you give a rough description?Thanks.


View raw message