perl-asp mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arnon Weinberg <ar...@back2front.ca>
Subject Re: Output character encoding
Date Tue, 05 Jun 2012 17:21:23 GMT

On 2012-06-05 05:55, Warren Young wrote:
> There are several places where you set this, not just one, and they 
> all have to agree to guarantee correct output:
>
>     DB -> back end -> Apache -> HTML -> Apache::ASP -> browser
>
> If they do not all agree, you can either get mixed encodings or 
> encoding ping-ponging.
>
> So, you have to check all the links in that chain:

With my test cases (provided) I have carefully narrowed down the 
inconsistency to Apache::ASP, since everything else is either not 
applicable or the same.

> - Apache has things like the "AddDefaultCharset" directive which play 
> into this.

No, it doesn't, since I'm not testing the browser.  For the record 
though, when I use GET -e, I see the correct header in both tests: 
Content-Type: text/html; charset=ISO-8859-1

> - For the Perl aspects, I recommend just reading the Perl manual 
> chapter on it: perldoc perlunicode.  Perl's Unicode support is deep, 
> broad, and continually evolving[*].  You really must read your 
> particular version's docs to know exactly how it's going to behave.  
> There have been several breaking changes over the past decade or so.

Perl is behaving as documented.  Apache::ASP is giving me trouble.

> - There are at least three ways to set the character encoding in your 
> HTML.  RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML
>
> - And finally, it's possible to set a browser to ignore whatever it's 
> told by the HTTP server and the document, and force it to interpret 
> the data using some other character set.

That's all true, but none of it matters since with a mixed encoding 
output, there is no character set encoding that I can use on the browser 
to show a correct decoding.

>
>> Regular perl/CGI output defaults to ISO-8859-1 encoding,
>
> Really?  I'd expect it to take the overall Perl default, which is 
> UTF-8 on most Unix type systems with Perl 5.6 onward on OSes 
> contemporary with that version of Perl.  I would have expected that 
> you'd have to go out of your way to force a return to Latin-1.

Yes, this is right out of the manual (open):
"... the default layer for the operating system (:raw on Unix, :crlf on 
Windows) is used."
The :utf8 output layer encoding must be explicitly set, as it is not the 
default.  However, I have not figured out how to do this successfully 
within Apache::ASP.

> It's 2012.  Please, please, please abandon Latin-1.  Everything speaks 
> UTF-8 these days, at the borders at least, even systems like Windows 
> and JavaScript where it isn't the native character set.  It is safe to 
> consider UTF-8 the standard Unicode encoding online.

This is part of an exercise to do just that.  At the moment, we have 
many lines of legacy code still using Latin-1, and are converting them 
step-wise to use UTF-8.  As the test cases show however, they do not 
play well together on Apache::ASP (though they are fine everywhere 
else).  If anyone has any suggestions on how this can be resolved so 
that we can continue the conversion, that would be much appreciated.


-- 
-------------------------------------------------------------------------------
Arnon Weinberg
www.back2front.ca


---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Mime
View raw message