httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zvi Har'El" ...@math.technion.ac.il>
Subject AddCharset filename extensions (again)
Date Fri, 19 Mar 2004 16:59:19 GMT
Dear Apache developers,

I sent the following three months ago, but since I got no response, and now
2.0.49 has been rolled without the patch, I resubmit it for you attention:


The default httpd.conf includes the lines

AddCharset ISO-8859-1  .iso8859-1  .latin1
AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
AddCharset ISO-8859-3  .iso8859-3  .latin3
AddCharset ISO-8859-4  .iso8859-4  .latin4
AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk

However, quick look at http://www.iana.org/assignments/character-sets shows
that calling the non-latin charsets ISO8859-N by the name latinN is wrong. 
For example, latin8 is ISO-8859-14, or iso-celtic, and certainly not
ISO-8859-8, which is just hebrew! Similarly, latin6 is ISO-8859-10, and not
ISO-8859-6, which is arabic! Finally, latin5 is ISO-8859-9, turkish, and not
ISO-8859-5, which is cyrillic. latin1-4 are ok, and I didn't find latin7 in
this reference at all. I suggest httpd.conf should be fixed accordingly.

To make my point clearer, here is the patch:


--- httpd-2.0.48/docs/conf/httpd-std.conf.in.~20031011014743~	2003-10-11 03:47:43.000000000
+0200
+++ httpd-2.0.48/docs/conf/httpd-std.conf.in	2003-12-15 18:47:07.000000000 +0200
@@ -797,11 +797,15 @@
 AddCharset ISO-8859-2  .iso8859-2  .latin2 .cen
 AddCharset ISO-8859-3  .iso8859-3  .latin3
 AddCharset ISO-8859-4  .iso8859-4  .latin4
-AddCharset ISO-8859-5  .iso8859-5  .latin5 .cyr .iso-ru
-AddCharset ISO-8859-6  .iso8859-6  .latin6 .arb
-AddCharset ISO-8859-7  .iso8859-7  .latin7 .grk
-AddCharset ISO-8859-8  .iso8859-8  .latin8 .heb
-AddCharset ISO-8859-9  .iso8859-9  .latin9 .trk
+AddCharset ISO-8859-5  .iso8859-5  .cyr .iso-ru
+AddCharset ISO-8859-6  .iso8859-6  .arb
+AddCharset ISO-8859-7  .iso8859-7  .grk
+AddCharset ISO-8859-8  .iso8859-8  .heb
+AddCharset ISO-8859-9  .iso8859-9  .latin5 .trk
+AddCharset ISO-8859-10  .iso8859-10  .latin6 
+AddCharset ISO-8859-13  .iso8859-13  .latin7 
+AddCharset ISO-8859-14  .iso8859-14  .latin8 
+AddCharset ISO-8859-15  .iso8859-15  .latin9 
 AddCharset ISO-2022-JP .iso2022-jp .jis
 AddCharset ISO-2022-KR .iso2022-kr .kis
 AddCharset ISO-2022-CN .iso2022-cn .cis




I have also included latin7 and latin9, which for some reason absent from IANA,
but appear as standard in in  the FSF's "free recode". BTW, instead of
inventing new charset abbreviations like .cyr, .arb, .grk, .heb, I would
personally prefer using the IANA (RFC 1345) aliases: .cyrillic, .arabic,
.greek, .hebrew, in the same way we use .latin1, .latin2 , etc, but this is a
matter of opinion, not bug fix patching.

Best,

Zvi.

-- 
Dr. Zvi Har'El     mailto:rl@math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                                  Friday, 27 Adar 5764, 19 March 2004,  6:53PM

Mime
View raw message