perl-asp mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Warren Young <war...@etr-usa.com>
Subject Re: Output character encoding
Date Tue, 05 Jun 2012 09:55:25 GMT
On 6/5/2012 3:02 AM, Arnon Weinberg wrote:
>
> How can I set the output character encoding of Apache::ASP output?

There are several places where you set this, not just one, and they all 
have to agree to guarantee correct output:

	DB -> back end -> Apache -> HTML -> Apache::ASP -> browser

If they do not all agree, you can either get mixed encodings or encoding 
ping-ponging.

Ping-ponging is less common these days now that the world is settling on 
UTF-8.  Back in the Perl 5.6/Apache 1.3/pre-Firefox days, I remember 
once chasing data through a system that stored data in the DB in 
Latin-1, which got translated to UTF-8 in the back-end daemon, which 
then sent it on to Apache and mod_perl, one of which smashed the data 
back to Latin-1 (never did nail that one down), before sending the data 
out to the browser which saw UTF-8 because Apache was configured to use 
that by default!

So, you have to check all the links in that chain:

- Your DB and any back-end daemon are up to you, since they're out of 
scope on this list.

- Apache has things like the "AddDefaultCharset" directive which play 
into this.

- For the Perl aspects, I recommend just reading the Perl manual chapter 
on it: perldoc perlunicode.  Perl's Unicode support is deep, broad, and 
continually evolving[*].  You really must read your particular version's 
docs to know exactly how it's going to behave.  There have been several 
breaking changes over the past decade or so.

- There are at least three ways to set the character encoding in your 
HTML.  RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML

- And finally, it's possible to set a browser to ignore whatever it's 
told by the HTTP server and the document, and force it to interpret the 
data using some other character set.


[*] Literally continuously.  I happened to read through the Perl release 
notes from 5.8 onward last week, and I saw Unicode related changes in 
*every* major release, including the just-released 5.16!

> Regular perl/CGI output defaults to ISO-8859-1 encoding,

Really?  I'd expect it to take the overall Perl default, which is UTF-8 
on most Unix type systems with Perl 5.6 onward on OSes contemporary with 
that version of Perl.  I would have expected that you'd have to go out 
of your way to force a return to Latin-1.

Now, if you're on a system where the native character set is still 
Latin-1, I'd understand that, but then you'd be running a 10 year old 
box, wouldn't you? :)

> How can I get the same results as the CGI script above?

It's 2012.  Please, please, please abandon Latin-1.  Everything speaks 
UTF-8 these days, at the borders at least, even systems like Windows and 
JavaScript where it isn't the native character set.  It is safe to 
consider UTF-8 the standard Unicode encoding online.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Mime
View raw message